How to find difference between 2 dataframes?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to find difference between 2 dataframes?

How to find difference between 2 dataframes?

This recipe helps you find difference between 2 dataframes

0

Recipe Objective

While working with dataframes, many a times we have two dataframes and there is a need to find difference i.e. find the complement set of A intersection B. Such problems can be easily handled by concat fuction.

So this recipe is a short example on how to find difference between two dataframes. Let's get started.

Step 1 - Import the library

import pandas as pd

Let's pause and look at these imports. Pandas is generally used for data manipulation and analysis.

Step 2 - Setup the Data

df1= pd.DataFrame({'Student': ['Ram','Rohan','Shyam','Mohan'], 'Grade': ['A','C','B','Ex']}) df2 = pd.DataFrame({'Student': ['Ram','Shyam',], 'Grade': ['A','B']})

Let us create a two simple dataset of Student and grades.

Step 3 - Finding Difference

df3=pd.concat([df1,df2]).drop_duplicates(keep=False)

Concat function in pandas library help us in performing addition operation over dataframes. Here we are initially combining dataframes df1 and df2 and using drop_duplicates function, dropping out the intersection elements of the dataframes; hence taking the net difference.

Step 4 - Printing results

print('df1\n',df1) print('df2\n',df2) print('df1-df2\n',df3)

Simply use print function to print df1, df2 and our new dataframe df1~df2

Step 5 - Let's look at our dataset now

Once we run the above code snippet, we will see:

Scroll down to the ipython notebook below to see the output.

Relevant Projects

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Human Activity Recognition Using Smartphones Data Set
In this deep learning project, you will build a classification system where to precisely identify human fitness activities.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.