How to find count of missing values in a dataframe?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to find count of missing values in a dataframe?

How to find count of missing values in a dataframe?

This recipe helps you find count of missing values in a dataframe

0

Recipe Objective

Missing value is one of the most common problem in any raw dataset. To create a precise and unbiased machine learning model, we need to deal with these Missing values after identifying them. There are different steps that we can take to do so:

  1. Identifying number of missing values in each column
  2. Based on the number, we decide whether we need to drop the column or replace it with it's mean, median or any other computed value.

In this recipe, we will demonstrate how to count the number of missing values (NA) in a dataframe in R ​

STEP 1: Creating a DataFrame

Creating a STUDENT dataframe with student_id, Name and marks as columns

STUDENT = data.frame(student_id = c(1,2,3,NA,5), Name = c("Ram","Shyam", "Jessica", NA, NA), Marks = c(NA, 60, NA, 80, NA))

STEP 2:Finding number of NA values

We will use built-in function sum(is.na(x)) where x is a dataframe or a column.

is.na() function first checks whether the element is a missing value or not and then sum() function adds the number of times the condition was True.

sum(is.na(STUDENT))
6

To calculate the number of missing values in every column. We use colSums() function. This returns the count of missing values w.r.t each column.

colSums(is.na(STUDENT))
student_id 	1
Name 		2
Marks		3
​

Relevant Projects

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Human Activity Recognition Using Multiclass Classification in Python
In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.