How to find difference between 2 dataframes in pandas

This recipe helps you find difference between 2 dataframes in pandas

Recipe Objective

While working with dataframes, many a times we have two dataframes and there is a need to find difference i.e. find the complement set of A intersection B. Such problems can be easily handled by concat fuction.

So this recipe is a short example on how to find difference between two dataframes. Let's get started.

Step 1 - Import the library

import pandas as pd

Let's pause and look at these imports. Pandas is generally used for data manipulation and analysis.

Step 2 - Setup the Data

df1= pd.DataFrame({'Student': ['Ram','Rohan','Shyam','Mohan'], 'Grade': ['A','C','B','Ex']}) df2 = pd.DataFrame({'Student': ['Ram','Shyam',], 'Grade': ['A','B']})

Let us create a two simple dataset of Student and grades.

Step 3 - Finding Difference

df3=pd.concat([df1,df2]).drop_duplicates(keep=False)

Concat function in pandas library help us in performing addition operation over dataframes. Here we are initially combining dataframes df1 and df2 and using drop_duplicates function, dropping out the intersection elements of the dataframes; hence taking the net difference.

Step 4 - Printing results

print('df1\n',df1) print('df2\n',df2) print('df1-df2\n',df3)

Simply use print function to print df1, df2 and our new dataframe df1~df2

Step 5 - Let's look at our dataset now

Once we run the above code snippet, we will see:

Scroll down to the ipython notebook below to see the output.

What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Deploy Transformer BART Model for Text summarization on GCP
Learn to Deploy a Machine Learning Model for the Abstractive Text Summarization on Google Cloud Platform (GCP)

Build a Churn Prediction Model using Ensemble Learning
Learn how to build ensemble machine learning models like Random Forest, Adaboost, and Gradient Boosting for Customer Churn Prediction using Python

LLM Project to Build and Fine Tune a Large Language Model
In this LLM project for beginners, you will learn to build a knowledge-grounded chatbot using LLM's and learn how to fine tune it.

Build a Graph Based Recommendation System in Python -Part 1
Python Recommender Systems Project - Learn to build a graph based recommendation system in eCommerce to recommend products.

Deep Learning Project for Beginners with Source Code Part 1
Learn to implement deep neural networks in Python .

Build a Multi Class Image Classification Model Python using CNN
This project explains How to build a Sequential Model that can perform Multi Class Image Classification in Python using CNN

End-to-End ML Model Monitoring using Airflow and Docker
In this MLOps Project, you will learn to build an end to end pipeline to monitor any changes in the predictive power of model or degradation of data.

Build Portfolio Optimization Machine Learning Models in R
Machine Learning Project for Financial Risk Modelling and Portfolio Optimization with R- Build a machine learning model in R to develop a strategy for building a portfolio for maximized returns.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.