# Zillow’s Home Value Prediction (Zestimate)

#### Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

## Customer Love

#### Shailesh Kurdekar

Solutions Architect at Capital One

I have worked for more than 15 years in Java and J2EE and have recently developed an interest in Big Data technologies and Machine learning due to a big need at my workspace. I was referred here by a... Read More

#### Dhiraj Tandon

Solution Architect-Cyber Security at ColorTokens

My Interaction was very short but left a positive impression. I enrolled and asked for a refund since I could not find the time. What happened next: They initiated Refund immediately. Their... Read More

## What will you learn

Understanding the problem statement
Importing the dataset from Amazon AWS
How to analyze the result of the summary function from R and basic EDA
Using ggplot and Correlation Plot to find similarities between variables
Checking for variables with null values and handling them
Checking skewness of the target variable using Histogram
Checking contribution of different variables to the target variable
Finding the best feature and eliminating the least significant ones
Defining the evaluation metric 'log_error' and understanding it's significance
Selecting Boosting model XGBoost and converting dataset into DMatrix
Applying XGBoost model on the Dataset
Defining parameters for Hyperparameter tuning
Using Cross Folds Validation to prevent overfitting
Visualizing important features for XGboost model
Training the final model using the selected features
Making final predictions and Saving in CSV format

## Project Description

Zillow is asking you to predict the log-error between their Zestimate and the actual sale price, given all the features of a home. The log error is defined as:

$logerror = log(Zestimate) %u2212 log(SalePrice)$$- log(SalePrice)$

and it is recorded in the transactions file train.csv. In this project, you are going to predict the log error for the months in Fall 2017.

"Zestimates" are estimated home values based on 7.5 million statistical and machine learning models that analyze hundreds of data points on each property. And, by continually improving the median margin of error (from 14% at the onset to 5% today), Zillow has since become established as one of the largest, most trusted marketplaces for real estate information in the U.S. and a leading example of impactful machine learning.

In this data science project, we will develop a machine learning algorithm that makes predictions about the future sale prices of homes. We will also build a model to improve the Zestimate residual error. And finally, we'll build a home valuation algorithm from the ground up, using external data sources.

## Similar Projects

#### Deep Learning with Keras in R to Predict Customer Churn

In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

#### Predict Census Income using Deep Learning Models

In this project, we are going to work on Deep Learning using H2O to predict Census income.

#### German Credit Dataset Analysis to Classify Loan Applications

In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

## Curriculum For This Mini Project

Problem Statement
01m
Explore Data Set
02m
Understand the features
03m
Import Libraries
03m
Recoding of variables
04m
Find transactions by month
12m
Distribution of Transactions
01m
Distribution of Target variable
15m
Represent Missing values
07m
Finding relevant features
02m
Correlation between features and target variable
14m
Shape of Distribution
04m
Spread of log error over years
04m
Zestimate variable prediction
06m
Building Model
10m
XGBoost Model
13m
Prediction
04m
Hyperparameter Tuning
01m
Cross Validation
03m
Get Best Results
16m
Conclusion
01m