How to impute missing values in a dataframe in R

This recipe helps you impute missing values in a dataframe in R

Recipe Objective

Missing value is one of the most common problem in any raw dataset. To create a precise and unbiased machine learning model, we need to deal with these Missing values after identifying them. There are different steps that we can take to do so: ​

  1. Identifying number of missing values in each column
  2. Based on the number, we decide whether we need to drop the column or replace it with it's mean, median or any other computed value.

In this recipe, we will demonstrate how to impute missing values (NA) in a dataframe. ​

STEP 1: Creating a DataFrame

Creating a STUDENT dataframe with student_id, Name and marks as columns ​

STUDENT = data.frame(student_id = c(1,2,3,4,5), Name = c("Ram","Shyam", "Jessica", "Nisarg", "Daniel"), Marks = c(55, 60, NA, 70, NA))
student_id	Name	Marks
1		Ram	55
2		Shyam	60
3		Jessica	NA
4		Nisarg	70
5		Daniel	NA

STEP 2: Imputing missing values with mean of the respective column

First, we will use is.na() function to check whether the cell contains a missing value or not. Then, using mean() function to compute the mean value and imputing wherver the earlier function is true.

STUDENT$Marks[is.na(STUDENT$Marks)] <- mean(STUDENT$Marks, na.rm=TRUE) STUDENT
student_id	Name	Marks
1		Ram	55.00000
2		Shyam	60.00000
3		Jessica	61.66667
4		Nisarg	70.00000
5		Daniel	61.66667

What Users are saying..

profile image

Jingwei Li

Graduate Research assistance at Stony Brook University
linkedin profile url

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Skip Gram Model Python Implementation for Word Embeddings
Skip-Gram Model word2vec Example -Learn how to implement the skip gram algorithm in NLP for word embeddings on a set of documents.

MLOps Project to Deploy Resume Parser Model on Paperspace
In this MLOps project, you will learn how to deploy a Resume Parser Streamlit Application on Paperspace Private Cloud.

Ola Bike Rides Request Demand Forecast
Given big data at taxi service (ride-hailing) i.e. OLA, you will learn multi-step time series forecasting and clustering with Mini-Batch K-means Algorithm on geospatial data to predict future ride requests for a particular region at a given time.

AWS MLOps Project to Deploy a Classification Model [Banking]
In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

Learn to Build a Polynomial Regression Model from Scratch
In this Machine Learning Regression project, you will learn to build a polynomial regression model to predict points scored by the sports team.

Learn Hyperparameter Tuning for Neural Networks with PyTorch
In this Deep Learning Project, you will learn how to optimally tune the hyperparameters (learning rate, epochs, dropout, early stopping) of a neural network model in PyTorch to improve model performance.

MLOps AWS Project on Topic Modeling using Gunicorn Flask
In this project we will see the end-to-end machine learning development process to design, build and manage reproducible, testable, and evolvable machine learning models by using AWS

NLP Project for Multi Class Text Classification using BERT Model
In this NLP Project, you will learn how to build a multi-class text classification model using using the pre-trained BERT model.

Multilabel Classification Project for Predicting Shipment Modes
Multilabel Classification Project to build a machine learning model that predicts the appropriate mode of transport for each shipment, using a transport dataset with 2000 unique products. The project explores and compares four different approaches to multilabel classification, including naive independent models, classifier chains, natively multilabel models, and multilabel to multiclass approaches.

PyTorch Project to Build a LSTM Text Classification Model
In this PyTorch Project you will learn how to build an LSTM Text Classification model for Classifying the Reviews of an App .