1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com

# Zillow’s Home Value Prediction (Zestimate)

Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.
4.64.6

## What will you learn

• Problem statement analysis
• Exploratory Data Analysis
• Input Data Visualization
• Interpretation from Visualization
• Making sense of data
• Implementation using R

## What will you get

• Access to all material related to project like data files, solution files etc.

## Prerequisites

• Jupyter Notebook from Anaconda installation
• R (3.3.3) and R-Studio (1.4) installation
• At least 4 GB RAM Machine

## Project Description

Zillow is asking you to predict the log-error between their Zestimate and the actual sale price, given all the features of a home. The log error is defined as:

$logerror = log(Zestimate) %u2212 log(SalePrice)$$- log(SalePrice)$

and it is recorded in the transactions file train.csv. In this project, you are going to predict the log error for the months in Fall 2017.

"Zestimates" are estimated home values based on 7.5 million statistical and machine learning models that analyze hundreds of data points on each property. And, by continually improving the median margin of error (from 14% at the onset to 5% today), Zillow has since become established as one of the largest, most trusted marketplaces for real estate information in the U.S. and a leading example of impactful machine learning.

In this data science project, we will develop a machine learning algorithm that makes predictions about the future sale prices of homes. We will also build a model to improve the Zestimate residual error. And finally, we'll build a home valuation algorithm from the ground up, using external data sources.

## Curriculum For This Mini Project

Problem Statement
00:01:09
Explore Data Set
00:02:59
Understand the features
00:03:41
Import Libraries
00:03:32
Recoding of variables
00:04:40
Find transactions by month
00:12:48
Distribution of Transactions
00:01:01
Distribution of Target variable
00:15:53
Represent Missing values
00:07:59
Finding relevant features
00:02:17
Correlation between features and target variable
00:14:51
Shape of Distribution
00:04:11
Spread of log error over years
00:04:10
Zestimate variable prediction
00:06:57
Building Model
00:10:56
XGBoost Model
00:13:03
Prediction
00:04:02
Hyperparameter Tuning
00:01:31
Cross Validation
00:03:36
Get Best Results
00:16:41
Conclusion
00:01:44