Zillow’s Home Value Prediction (Zestimate)

Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

What will you learn

  • Understanding the problem statement

  • Importing the dataset from Amazon AWS

  • How to analyze the result of the summary function from R and basic EDA

  • Using ggplot and Correlation Plot to find similarities between variables

  • Checking for variables with null values and handling them

  • Checking skewness of the target variable using Histogram

  • Checking contribution of different variables to the target variable

  • Finding the best feature and eliminating the least significant ones

  • Defining the evaluation metric 'log_error' and understanding it's significance

  • Selecting Boosting model XGBoost and converting dataset into DMatrix

  • Applying XGBoost model on the Dataset

  • Defining parameters for Hyperparameter tuning

  • Using Cross Folds Validation to prevent overfitting

  • Visualizing important features for XGboost model

  • Training the final model using the selected features

  • Making final predictions and Saving in CSV format

Project Description

Zillow is asking you to predict the log-error between their Zestimate and the actual sale price, given all the features of a home. The log error is defined as:

logerror = log(Zestimate) − log(SalePrice)- log(SalePrice)

and it is recorded in the transactions file train.csv. In this project, you are going to predict the log error for the months in Fall 2017.

"Zestimates" are estimated home values based on 7.5 million statistical and machine learning models that analyze hundreds of data points on each property. And, by continually improving the median margin of error (from 14% at the onset to 5% today), Zillow has since become established as one of the largest, most trusted marketplaces for real estate information in the U.S. and a leading example of impactful machine learning.

In this data science project, we will develop a machine learning algorithm that makes predictions about the future sale prices of homes. We will also build a model to improve the Zestimate residual error. And finally, we'll build a home valuation algorithm from the ground up, using external data sources.

Similar Projects

Big Data Project Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.
Big Data Project Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.
Big Data Project Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.
Big Data Project Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Curriculum For This Mini Project

 
  Problem Statement
01m
  Explore Data Set
02m
  Understand the features
03m
  Import Libraries
03m
  Recoding of variables
04m
  Find transactions by month
12m
  Distribution of Transactions
01m
  Distribution of Target variable
15m
  Represent Missing values
07m
  Finding relevant features
02m
  Correlation between features and target variable
14m
  Shape of Distribution
04m
  Spread of log error over years
04m
  Zestimate variable prediction
06m
  Building Model
10m
  XGBoost Model
13m
  Prediction
04m
  Hyperparameter Tuning
01m
  Cross Validation
03m
  Get Best Results
16m
  Conclusion
01m