Predict Census Income using Deep Learning Models

In this project, we are going to work on Deep Learning using H2O to predict Census income.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

What will you learn

  • Understanding the problem statement

  • Importing the dataset from AWS

  • Importing important libraries and understanding its use

  • Using Deep Learning Models for making predictions

  • Understanding classification ,regression,clustering and dimension reduction

  • Learining Back Propagation and Forward Propagation

  • Understanding Cost Function

  • Performing basic EDA and checking for null values

  • How to use the summary function in R and interpret the result

  • Installing h2o and creating h20 clusters for faster calculation

  • Defining parameters for Deep Learning model

  • Compute variable importance and performance

  • Performing GRID search for hyperparameter tuning

  • Training the model and making predictions using them

  • Closing the initiated h20 cluster

Project Description

This data was extracted from the census bureau database found at:
http://www.census.gov/ftp/pub/DES/www/welcome.html
Donor: Ronny Kohavi and Barry Becker,
            Data Mining and Visualization
            Silicon Graphics.
            e-mail: ronnyk@sgi.com for questions.
Split into train-test using MLC++ GenCVFiles (2/3, 1/3 random).
48842 instances, mix of continuous and discrete    (train=32561, test=16281)
45222 if instances with unknown values are removed (train=30162, test=15060)
Duplicate or conflicting instances : 6
Class probabilities for adult.all file
Probability for the label '>50K'  : 23.93% / 24.78% (without unknowns)
Probability for the label '<=50K' : 76.07% / 75.22% (without unknowns)

Extraction was done by Barry Becker from the 1994 Census database.  
A set of reasonably clean records was extracted using the following conditions:
   ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0))
Prediction task is to determine whether a person makes over 50K a year.

Similar Projects

Big Data Project Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.
Big Data Project Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.
Big Data Project Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.
Big Data Project Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Curriculum For This Mini Project

 
  Problem Statement
04m
  Import Data Sets
08m
  What is Deep Learning?
35m
  Understanding H2O
05m
  Data Sanity Check
13m
  Remove leading white space
03m
  Impute Missing values
00m
  Initializing H2O
02m
  Train H2O model without hidden layer (C )
04m
  Hyperparameter optimization
16m
  Random Grid Search
10m