Each project comes with 2-5 hours of micro-videos explaining the solution.

Get access to 50+ solved projects with iPython notebooks and datasets.

Add project experience to your Linkedin/Github profiles.

Understanding the problem statement

Importing the dataset from Amazon AWS

How to analyze the result of the summary function from R and basic EDA

Using ggplot and Correlation Plot to find similarities between variables

Checking for variables with null values and handling them

Checking skewness of the target variable using Histogram

Checking contribution of different variables to the target variable

Finding the best feature and eliminating the least significant ones

Defining the evaluation metric 'log_error' and understanding it's significance

Selecting Boosting model XGBoost and converting dataset into DMatrix

Applying XGBoost model on the Dataset

Defining parameters for Hyperparameter tuning

Using Cross Folds Validation to prevent overfitting

Visualizing important features for XGboost model

Training the final model using the selected features

Making final predictions and Saving in CSV format

Zillow is asking you to predict the log-error between their Zestimate and the actual sale price, given all the features of a home. The log error is defined as:

and it is recorded in the transactions file train.csv. In this project, you are going to predict the log error for the months in Fall 2017.

"Zestimates" are estimated home values based on 7.5 million statistical and machine learning models that analyze hundreds of data points on each property. And, by continually improving the median margin of error (from 14% at the onset to 5% today), Zillow has since become established as one of the largest, most trusted marketplaces for real estate information in the U.S. and a leading example of impactful machine learning.

In this data science project, we will develop a machine learning algorithm that makes predictions about the future sale prices of homes. We will also build a model to improve the Zestimate residual error. And finally, we'll build a home valuation algorithm from the ground up, using external data sources.

There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Given a partial trajectory of a taxi, you will be asked to predict its final destination using the taxi trajectory dataset.

In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Problem Statement

01m

Explore Data Set

02m

Understand the features

03m

Import Libraries

03m

Recoding of variables

04m

Find transactions by month

12m

Distribution of Transactions

01m

Distribution of Target variable

15m

Represent Missing values

07m

Finding relevant features

02m

Correlation between features and target variable

14m

Shape of Distribution

04m

Spread of log error over years

04m

Zestimate variable prediction

06m

Building Model

10m

XGBoost Model

13m

Prediction

04m

Hyperparameter Tuning

01m

Cross Validation

03m

Get Best Results

16m

Conclusion

01m