Census Income Data Set Project - Predict Adult Census Income

Census Income Data Set Project - Predict Adult Census Income

Use the Adult Income dataset to predict whether income exceeds 50K yr based on census data.
explanation image

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews
profile image

Swati Patra linkedin profile url

Systems Advisor , IBM

I have 11 years of experience and work with IBM. My domain is Travel, Hospitality and Banking - both sectors process lots of data. The way the projects were set up and the mentors' explanation was... Read More

profile image

Hiren Ahir linkedin profile url

Microsoft Azure SQL Sever Developer, BI Developer

I'm a Graduate student and came into the job market and found a university degree wasn't sufficient to get a good paying job. I aimed at hottest technology in the market Big Data but the word BigData... Read More

What will you learn

Understanding the problem statement
Importing the dataset and importing libraries
Performing basic EDA
Data cleaning Imputing the null values and if required filling them using appropriate methods
Checking data distribution using statistical techniques
Checking for outliers and how they need to be treated as per the model selection
Using python libraries such as matplotlib and seaborn for better and advanced visualizations
Splitting Dataset into Train and Test using Stratified Sampling
Feature Engineering for better decision making by a model
Training a model using Vanilla DNN
As per the result, research for other network architectures
Understanding Class Imbalance Problem and whether any solution needed to tackle it
Doing Cross Validation to see if the model is overfitting and whether results are somewhat constant
Tuning hyperparameters of models to achieve optimal performance and their effect in the results
Making predictions using the trained model
Gaining confidence in the model using metrics such as Accuracy,Precision,Recall,F1-Score,AUC
Understanding why Accuracy might be/might not be a good metric to check results
Selection of the best model based on Feature Importance and the metrics

Project Description

Business Context

A census as the total process of collecting, compiling, and publishing demographic, economic, and social data pertaining to a specific time to all persons in a country or delimited part of a country. As part of a census count, most countries also include a census of housing. It is the process of collecting, compiling and publishing information on buildings, living quarters and building-related facilities such as sewage systems, bathrooms, and electricity, to name a few.

 

Possible Uses of Census Information

Census Information

Potential Uses

Total Population Size

When two or more census counts are compared for the same location, planners can determine if locales are increasing or decreasing in size.

Age

Used to help identify segments of the population that require different types of services.

Sex

Sex ratios can be calculated by 5-year age groups to crudely observe migration, especially among the working age cohorts.

Marital Status

Used to provide insights into family formation and housing needs.

Household Composition and Size

Used to help determine housing needs for related and unrelated households.

Educational Attainment and Literacy

Used to provide information on the educational skills of the work force. These measures also help planners select the best strategies to communicate with residents.

Location of Residence and Place of Prior Residence

Helps assess changes in rural and urban areas. Place of prior residence helps to identify communities that are experiencing in- or out-migration.

Occupation and Labor Force Participation

Helps to provide insights into the labor force of a given locale. The information can be used to develop economic development strategies.

Living Quarter Characteristics

Can help planners determine housing and community facility needs



Data Description

In this project, we will use a standard imbalanced machine learning dataset referred to as the “Adult Income” or simply the “adult” dataset.

The dataset is credited to Ronny Kohavi and Barry Becker and was drawn from the 1994 United States Census Bureau data and involves using personal details such as education level to predict whether an individual will earn more or less than $50,000 per year.

The dataset provides 14 input variables that are a mixture of categorical, ordinal, and numerical data types. The complete list of variables is as follows:

  • Age.
  • Workclass.
  • Final Weight.
  • Education.
  • Education Number of Years.
  • Marital-status.
  • Occupation.
  • Relationship.
  • Race.
  • Sex.
  • Capital-gain.
  • Capital-loss.
  • Hours-per-week.
  • Native-country.

The dataset contains missing values that are marked with a question mark character (?).

There are a total of 48,842 rows of data, and 3,620 with missing values, leaving 45,222 complete rows.

There are two class values ‘>50K‘ and ‘<=50K‘, meaning it is a binary classification task. The classes are imbalanced, with a skew toward the ‘<=50K‘ class label.

  • ‘>50K’: majority class, approximately 25%.
  • ‘<=50K’: minority class, approximately 75%.

 

Data Source 

http://www.census.gov/ftp/pub/DES/www/welcome.html

 

Tools/Libraries 

  • Python
  • scikit-learn(machine learning library)
  • h2o.ai

 

Aim

Census Salary Prediction where we have to classify between >50K <=50K.

 

How Does it help

  • Real Estate Demands
  • Basic Amenities 
  • Fulfilling Infrastructure Demands

Similar Projects

Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

In this project, we will try to predict how often players playing a video game called PUBG will win when they play by themselves.

Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Curriculum For This Mini Project

Business Objective
06m
Data Description
08m
Descriptive Data Analysis and Missing Data Treatment
08m
Exploratory Data Analysis
07m
Relation Between Variables and Correlation Analysis
12m
Deep Learning Basics - Perceptron Classifier
09m
Deep Learning Basics - Dense Layer and Activation Functions
04m
Difference Between Machine Learning & Deep Learning
05m
H2O AI Models
04m
H20 AI - Data and Data Manipulation
03m
Deep Learning Using H2O AI - Models and Hyper Parameters Part 1
08m
Deep Learning Using H2O AI - Models and Hyper Parameters Part 2
07m
Deep Learning Using H2O AI - Models and Hyper Parameters Part 3
08m
Census Prediction Metric Evaluation
05m
Modular code folder structure
02m