How to impute missing values in a dataframe in R

This recipe helps you impute missing values in a dataframe in R

Recipe Objective

Missing value is one of the most common problem in any raw dataset. To create a precise and unbiased machine learning model, we need to deal with these Missing values after identifying them. There are different steps that we can take to do so: ​

  1. Identifying number of missing values in each column
  2. Based on the number, we decide whether we need to drop the column or replace it with it's mean, median or any other computed value.

In this recipe, we will demonstrate how to impute missing values (NA) in a dataframe. ​

STEP 1: Creating a DataFrame

Creating a STUDENT dataframe with student_id, Name and marks as columns ​

STUDENT = data.frame(student_id = c(1,2,3,4,5), Name = c("Ram","Shyam", "Jessica", "Nisarg", "Daniel"), Marks = c(55, 60, NA, 70, NA))
student_id	Name	Marks
1		Ram	55
2		Shyam	60
3		Jessica	NA
4		Nisarg	70
5		Daniel	NA

STEP 2: Imputing missing values with mean of the respective column

First, we will use is.na() function to check whether the cell contains a missing value or not. Then, using mean() function to compute the mean value and imputing wherver the earlier function is true.

STUDENT$Marks[is.na(STUDENT$Marks)] <- mean(STUDENT$Marks, na.rm=TRUE) STUDENT
student_id	Name	Marks
1		Ram	55.00000
2		Shyam	60.00000
3		Jessica	61.66667
4		Nisarg	70.00000
5		Daniel	61.66667

What Users are saying..

profile image

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd
linkedin profile url

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain... Read More

Relevant Projects

Build Regression (Linear,Ridge,Lasso) Models in NumPy Python
In this machine learning regression project, you will learn to build NumPy Regression Models (Linear Regression, Ridge Regression, Lasso Regression) from Scratch.

Text Classification with Transformers-RoBERTa and XLNet Model
In this machine learning project, you will learn how to load, fine tune and evaluate various transformer models for text classification tasks.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Census Income Data Set Project-Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based oncensus data.

Linear Regression Model Project in Python for Beginners Part 1
Machine Learning Linear Regression Project in Python to build a simple linear regression model and master the fundamentals of regression for beginners.

NLP Project to Build a Resume Parser in Python using Spacy
Use the popular Spacy NLP python library for OCR and text classification to build a Resume Parser in Python.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

PyTorch Project to Build a LSTM Text Classification Model
In this PyTorch Project you will learn how to build an LSTM Text Classification model for Classifying the Reviews of an App .

Time Series Forecasting Project-Building ARIMA Model in Python
Build a time series ARIMA model in Python to forecast the use of arrival rate density to support staffing decisions at call centres.

Build Portfolio Optimization Machine Learning Models in R
Machine Learning Project for Financial Risk Modelling and Portfolio Optimization with R- Build a machine learning model in R to develop a strategy for building a portfolio for maximized returns.