How to find difference between 2 dataframes?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to find difference between 2 dataframes?

How to find difference between 2 dataframes?

This recipe helps you find difference between 2 dataframes

0

Recipe Objective

While working with dataframes, many a times we have two dataframes and there is a need to find difference i.e. find the complement set of A intersection B. Such problems can be easily handled by concat fuction.

So this recipe is a short example on how to find difference between two dataframes. Let's get started.

Step 1 - Import the library

import pandas as pd

Let's pause and look at these imports. Pandas is generally used for data manipulation and analysis.

Step 2 - Setup the Data

df1= pd.DataFrame({'Student': ['Ram','Rohan','Shyam','Mohan'], 'Grade': ['A','C','B','Ex']}) df2 = pd.DataFrame({'Student': ['Ram','Shyam',], 'Grade': ['A','B']})

Let us create a two simple dataset of Student and grades.

Step 3 - Finding Difference

df3=pd.concat([df1,df2]).drop_duplicates(keep=False)

Concat function in pandas library help us in performing addition operation over dataframes. Here we are initially combining dataframes df1 and df2 and using drop_duplicates function, dropping out the intersection elements of the dataframes; hence taking the net difference.

Step 4 - Printing results

print('df1\n',df1) print('df2\n',df2) print('df1-df2\n',df3)

Simply use print function to print df1, df2 and our new dataframe df1~df2

Step 5 - Let's look at our dataset now

Once we run the above code snippet, we will see:

Scroll down to the ipython notebook below to see the output.

Relevant Projects

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.