Each project comes with 2-5 hours of micro-videos explaining the solution.

Get access to 102+ solved projects with iPython notebooks and datasets.

Add project experience to your Linkedin/Github profiles.

Understanding the problem statement

Importing the Train and Test dataset directly from the source

Performing basic EDA

Checking for null values and making imputations using appropriate methods

Dropping rows with null values

Understanding how a Linear model works and underlying assumptions

Multicollinearity, Autocorrelation, linearity and Normal distribution

Applying Ridge Regression for training

Applying GLMnet model for training

Applying elasticnet model for training

Converting Dataframe into DMatrix

Applying XGBoost model for making predictions

Defining parameters for XGBoost

Defining evaluation metrics

Plotting graphs for the results obtained to select the best model

Making final predictions with the best-selected model

Two Sigma is a technology company dedicated to finding value in the world’s data. Since its founding in 2001, Two Sigma has built an innovative platform that combines extraordinary computing power, vast amounts of information, and advanced data science to produce breakthroughs in investment management, insurance, and related fields. Economic opportunity depends on the ability to deliver singularly accurate forecasts in a world of uncertainty.

By accurately predicting financial movements, you will learn about scientifically-driven approaches to unlocking significant predictive capability.

Two Sigma is excited to find predictive value and gain a better understanding of the skills offered by the global data science crowd.

Estimating churners before they discontinue using a product or service is extremely important. In this ML project, you will develop a churn prediction model in telecom to predict customers who are most likely subject to churn.

Use the Zillow dataset to follow a test-driven approach and build a regression machine learning model to predict the price of the house based on other variables.

PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.

Hive Project -Learn to write a Hive program to find the first unique URL, given 'n' number of URL's.

Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.

In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.

In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Introduction & Installation

05m

Data Set Overview

03m

Problem Statement

01m

Data Analysis - Missing Values

38m

Recap

02m

Next Steps

06m

Why MICE

03m

Split Data Set into Train and Test

05m

Linear Regression - Assumptions

06m

Linear Regression - Model Creation

03m

Robust Linear Regression

03m

Ridge Regression

11m

Extreme Gradient Boosting

07m