Predict Census Income using Deep Learning Models

Predict Census Income using Deep Learning Models

In this project, we are going to work on Deep Learning using H2O to predict Census income.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Camille St. Omer

Artificial Intelligence Researcher, Quora 'Most Viewed Writer in 'Data Mining'

I came to the platform with no experience and now I am knowledgeable in Machine Learning with Python. No easy thing I must say, the sessions are challenging and go to the depths. I looked at graduate... Read More

Prasanna Lakshmi T

Advisory System Analyst at IBM

Initially, I was unaware of how this would cater to my career needs. But when I stumbled through the reviews given on the website. I went through many of them and found them all positive. I would... Read More

What will you learn

Understanding the problem statement
Importing the dataset from AWS
Importing important libraries and understanding its use
Using Deep Learning Models for making predictions
Understanding classification ,regression,clustering and dimension reduction
Learining Back Propagation and Forward Propagation
Understanding Cost Function
Performing basic EDA and checking for null values
How to use the summary function in R and interpret the result
Installing h2o and creating h20 clusters for faster calculation
Defining parameters for Deep Learning model
Compute variable importance and performance
Performing GRID search for hyperparameter tuning
Training the model and making predictions using them
Closing the initiated h20 cluster

Project Description

This data was extracted from the census bureau database found at:
http://www.census.gov/ftp/pub/DES/www/welcome.html
Donor: Ronny Kohavi and Barry Becker,
            Data Mining and Visualization
            Silicon Graphics.
            e-mail: ronnyk@sgi.com for questions.
Split into train-test using MLC++ GenCVFiles (2/3, 1/3 random).
48842 instances, mix of continuous and discrete    (train=32561, test=16281)
45222 if instances with unknown values are removed (train=30162, test=15060)
Duplicate or conflicting instances : 6
Class probabilities for adult.all file
Probability for the label '>50K'  : 23.93% / 24.78% (without unknowns)
Probability for the label '<=50K' : 76.07% / 75.22% (without unknowns)

Extraction was done by Barry Becker from the 1994 Census database.  
A set of reasonably clean records was extracted using the following conditions:
   ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0))
Prediction task is to determine whether a person makes over 50K a year.

Similar Projects

In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

In this project, we are going to predict item-level sales data using different forecasting techniques.

There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Curriculum For This Mini Project

Problem Statement
04m
Import Data Sets
08m
What is Deep Learning?
35m
Understanding H2O
05m
Data Sanity Check
13m
Remove leading white space
03m
Impute Missing values
00m
Initializing H2O
02m
Train H2O model without hidden layer (C )
04m
Hyperparameter optimization
16m
Random Grid Search
10m