How to impute missing values in a dataframe in R

This recipe helps you impute missing values in a dataframe in R
Last Updated: 08 Sep 2021

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

Missing value is one of the most common problem in any raw dataset. To create a precise and unbiased machine learning model, we need to deal with these Missing values after identifying them. There are different steps that we can take to do so:

Identifying number of missing values in each column
Based on the number, we decide whether we need to drop the column or replace it with it's mean, median or any other computed value.

In this recipe, we will demonstrate how to impute missing values (NA) in a dataframe.

STEP 1: Creating a DataFrame

Creating a STUDENT dataframe with student_id, Name and marks as columns


STUDENT = data.frame(student_id = c(1,2,3,4,5), Name = c("Ram","Shyam", "Jessica", "Nisarg", "Daniel"), Marks = c(55, 60, NA, 70, NA))

student_id	Name	Marks
1		Ram	55
2		Shyam	60
3		Jessica	NA
4		Nisarg	70
5		Daniel	NA

STEP 2: Imputing missing values with mean of the respective column

First, we will use is.na() function to check whether the cell contains a missing value or not. Then, using mean() function to compute the mean value and imputing wherver the earlier function is true.


STUDENT$Marks[is.na(STUDENT$Marks)] <- mean(STUDENT$Marks, na.rm=TRUE)
STUDENT

student_id	Name	Marks
1		Ram	55.00000
2		Shyam	60.00000
3		Jessica	61.66667
4		Nisarg	70.00000
5		Daniel	61.66667

What Users are saying..

Jingwei Li

Graduate Research assistance at Stony Brook University

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Skip Gram Model Python Implementation for Word Embeddings

Skip-Gram Model word2vec Example -Learn how to implement the skip gram algorithm in NLP for word embeddings on a set of documents.

View Project Details

MLOps Project to Deploy Resume Parser Model on Paperspace

In this MLOps project, you will learn how to deploy a Resume Parser Streamlit Application on Paperspace Private Cloud.

View Project Details

Ola Bike Rides Request Demand Forecast

Given big data at taxi service (ride-hailing) i.e. OLA, you will learn multi-step time series forecasting and clustering with Mini-Batch K-means Algorithm on geospatial data to predict future ride requests for a particular region at a given time.

View Project Details

AWS MLOps Project to Deploy a Classification Model [Banking]

In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

View Project Details

Learn to Build a Polynomial Regression Model from Scratch

In this Machine Learning Regression project, you will learn to build a polynomial regression model to predict points scored by the sports team.

View Project Details

Learn Hyperparameter Tuning for Neural Networks with PyTorch

In this Deep Learning Project, you will learn how to optimally tune the hyperparameters (learning rate, epochs, dropout, early stopping) of a neural network model in PyTorch to improve model performance.

View Project Details

MLOps AWS Project on Topic Modeling using Gunicorn Flask

In this project we will see the end-to-end machine learning development process to design, build and manage reproducible, testable, and evolvable machine learning models by using AWS

View Project Details

NLP Project for Multi Class Text Classification using BERT Model

In this NLP Project, you will learn how to build a multi-class text classification model using using the pre-trained BERT model.

View Project Details

Multilabel Classification Project for Predicting Shipment Modes

Multilabel Classification Project to build a machine learning model that predicts the appropriate mode of transport for each shipment, using a transport dataset with 2000 unique products. The project explores and compares four different approaches to multilabel classification, including naive independent models, classifier chains, natively multilabel models, and multilabel to multiclass approaches.

View Project Details

PyTorch Project to Build a LSTM Text Classification Model

In this PyTorch Project you will learn how to build an LSTM Text Classification model for Classifying the Reviews of an App .

View Project Details

How to impute missing values in a dataframe in R

Recipe Objective

STEP 1: Creating a DataFrame

STEP 2: Imputing missing values with mean of the respective column

Jingwei Li

Relevant Projects

You might also like

Relevant Projects