Each project comes with 2-5 hours of micro-videos explaining the solution.

Get access to 50+ solved projects with iPython notebooks and datasets.

Add project experience to your Linkedin/Github profiles.

Understanding the problem statement

Importing the dataset from Amazon AWS

How to analyze the result of the summary function from R and basic EDA

Using ggplot and Correlation Plot to find similarities between variables

Checking for variables with null values and handling them

Checking skewness of the target variable using Histogram

Checking contribution of different variables to the target variable

Finding the best feature and eliminating the least significant ones

Defining the evaluation metric 'log_error' and understanding it's significance

Selecting Boosting model XGBoost and converting dataset into DMatrix

Applying XGBoost model on the Dataset

Defining parameters for Hyperparameter tuning

Using Cross Folds Validation to prevent overfitting

Visualizing important features for XGboost model

Training the final model using the selected features

Making final predictions and Saving in CSV format

Zillow is asking you to predict the log-error between their Zestimate and the actual sale price, given all the features of a home. The log error is defined as:

and it is recorded in the transactions file train.csv. In this project, you are going to predict the log error for the months in Fall 2017.

"Zestimates" are estimated home values based on 7.5 million statistical and machine learning models that analyze hundreds of data points on each property. And, by continually improving the median margin of error (from 14% at the onset to 5% today), Zillow has since become established as one of the largest, most trusted marketplaces for real estate information in the U.S. and a leading example of impactful machine learning.

In this data science project, we will develop a machine learning algorithm that makes predictions about the future sale prices of homes. We will also build a model to improve the Zestimate residual error. And finally, we'll build a home valuation algorithm from the ground up, using external data sources.

In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Data science project in R to develop automated methods for predicting the cost and severity of insurance claims.

Problem Statement

01m

Explore Data Set

02m

Understand the features

03m

Import Libraries

03m

Recoding of variables

04m

Find transactions by month

12m

Distribution of Transactions

01m

Distribution of Target variable

15m

Represent Missing values

07m

Finding relevant features

02m

Correlation between features and target variable

14m

Shape of Distribution

04m

Spread of log error over years

04m

Zestimate variable prediction

06m

Building Model

10m

XGBoost Model

13m

Prediction

04m

Hyperparameter Tuning

01m

Cross Validation

03m

Get Best Results

16m

Conclusion

01m