Zillow’s Home Value Prediction (Zestimate)

Zillow’s Home Value Prediction (Zestimate)

Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Dhiraj Tandon

Solution Architect-Cyber Security at ColorTokens

My Interaction was very short but left a positive impression. I enrolled and asked for a refund since I could not find the time. What happened next: They initiated Refund immediately. Their... Read More


Lead Consultant, ITC Infotech

The project orientation is very much unique and it helps to understand the real time scenarios most of the industries are dealing with. And there is no limit, one can go through as many projects... Read More

What will you learn

Understanding the problem statement
Importing the dataset from Amazon AWS
How to analyze the result of the summary function from R and basic EDA
Using ggplot and Correlation Plot to find similarities between variables
Checking for variables with null values and handling them
Checking skewness of the target variable using Histogram
Checking contribution of different variables to the target variable
Finding the best feature and eliminating the least significant ones
Defining the evaluation metric 'log_error' and understanding it's significance
Selecting Boosting model XGBoost and converting dataset into DMatrix
Applying XGBoost model on the Dataset
Defining parameters for Hyperparameter tuning
Using Cross Folds Validation to prevent overfitting
Visualizing important features for XGboost model
Training the final model using the selected features
Making final predictions and Saving in CSV format

Project Description

Zillow is asking you to predict the log-error between their Zestimate and the actual sale price, given all the features of a home. The log error is defined as:

logerror = log(Zestimate) − log(SalePrice)- log(SalePrice)

and it is recorded in the transactions file train.csv. In this project, you are going to predict the log error for the months in Fall 2017.

"Zestimates" are estimated home values based on 7.5 million statistical and machine learning models that analyze hundreds of data points on each property. And, by continually improving the median margin of error (from 14% at the onset to 5% today), Zillow has since become established as one of the largest, most trusted marketplaces for real estate information in the U.S. and a leading example of impactful machine learning.

In this data science project, we will develop a machine learning algorithm that makes predictions about the future sale prices of homes. We will also build a model to improve the Zestimate residual error. And finally, we'll build a home valuation algorithm from the ground up, using external data sources.

Similar Projects

In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Curriculum For This Mini Project

Problem Statement
Explore Data Set
Understand the features
Import Libraries
Recoding of variables
Find transactions by month
Distribution of Transactions
Distribution of Target variable
Represent Missing values
Finding relevant features
Correlation between features and target variable
Shape of Distribution
Spread of log error over years
Zestimate variable prediction
Building Model
XGBoost Model
Hyperparameter Tuning
Cross Validation
Get Best Results