How to impute missing values in a dataframe?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to impute missing values in a dataframe?

How to impute missing values in a dataframe?

This recipe helps you impute missing values in a dataframe

0

Recipe Objective

Missing value is one of the most common problem in any raw dataset. To create a precise and unbiased machine learning model, we need to deal with these Missing values after identifying them. There are different steps that we can take to do so: ​

  1. Identifying number of missing values in each column
  2. Based on the number, we decide whether we need to drop the column or replace it with it's mean, median or any other computed value.

In this recipe, we will demonstrate how to impute missing values (NA) in a dataframe. ​

STEP 1: Creating a DataFrame

Creating a STUDENT dataframe with student_id, Name and marks as columns ​

STUDENT = data.frame(student_id = c(1,2,3,4,5), Name = c("Ram","Shyam", "Jessica", "Nisarg", "Daniel"), Marks = c(55, 60, NA, 70, NA))
student_id	Name	Marks
1		Ram	55
2		Shyam	60
3		Jessica	NA
4		Nisarg	70
5		Daniel	NA

STEP 2: Imputing missing values with mean of the respective column

First, we will use is.na() function to check whether the cell contains a missing value or not. Then, using mean() function to compute the mean value and imputing wherver the earlier function is true.

STUDENT$Marks[is.na(STUDENT$Marks)] <- mean(STUDENT$Marks, na.rm=TRUE) STUDENT
student_id	Name	Marks
1		Ram	55.00000
2		Shyam	60.00000
3		Jessica	61.66667
4		Nisarg	70.00000
5		Daniel	61.66667

Relevant Projects

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Loan Eligibility Prediction in Python using H2O.ai
In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

NLP and Deep Learning For Fake News Classification in Python
In this project you will use Python to implement various machine learning methods( RNN, LSTM, GRU) for fake news classification.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.