How to JOIN and MERGE Pandas DataFrame?

This recipe helps you JOIN and MERGE Pandas DataFrame

Recipe Objective

Have you ever felt a need of joining or merging of dataset? How did you do it?

So this is the recipe on how we can join and merge Pandas DataFrame.

Master the Art of Data Cleaning in Machine Learning

Step 1 - Import the library

import pandas as pd

We have imported numpy and pandas which will be needed for the dataset.

Step 2 - Setting up multiple Data Sets

We have created multiple dataframes that we will join and merge. raw_data = {"subject_id": ["1", "2", "3", "4", "5"], "first_name": ["Alex", "Amy", "Allen", "Alice", "Ayoung"], "last_name": ["Anderson", "Ackerman", "Ali", "Aoni", "Atiches"]} df_a = pd.DataFrame(raw_data, columns = ["subject_id", "first_name", "last_name"]) print(); print(df_a) raw_data = {"subject_id": ["4", "5", "6", "7", "8"], "first_name": ["Billy", "Brian", "Bran", "Bryce", "Betty"], "last_name": ["Bonder", "Black", "Balwner", "Brice", "Btisan"]} df_b = pd.DataFrame(raw_data, columns = ["subject_id", "first_name", "last_name"]) print(); print(df_b) raw_data = {"subject_id": ["1", "2", "3", "4", "5", "7", "8", "9", "10", "11"], "test_id": [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]} df_n = pd.DataFrame(raw_data, columns = ["subject_id","test_id"]) print(); print(df_n)

Step 3 - Joining and Merging DataFrames

Here we will be joining and merging dataframes in different ways. We will use pd.concat to join the dataset and pd.merge to merge the dataset.

    • Joining the two dataframes along rows

df_new = pd.concat([df_a, df_b]) print(df_new)

    • Droping rows where all cells in that row is NA

df_cleaned = df.dropna(how="all") print(df_cleaned)

    • Joining the two dataframes along columns

df = pd.concat([df_a, df_b], axis=1) print(df)

    • Merge two dataframes along the subject_id value

df = pd.merge(df_new, df_n, on="subject_id") print(df)

    • Merge two dataframes with both the left and right dataframes using the subject_id key

df = pd.merge(df_new, df_n, left_on="subject_id", right_on="subject_id") print(df)

    • Merge with outer join

df = pd.merge(df_a, df_b, on="subject_id", how="outer") print(df)

    • Merge with inner join

df = pd.merge(df_a, df_b, on="subject_id", how="inner") print(df)

    • Merge with right join

df = pd.merge(df_a, df_b, on="subject_id", how="right") print(df)

    • Merge with left join

df = pd.merge(df_a, df_b, on="subject_id", how="left") print(df)

    • Merge while adding a suffix to duplicate column names

df = pd.merge(df_a, df_b, on="subject_id", how="left", suffixes=("_left", "_right")) print(df)

    • Merge based on indexes

df = pd.merge(df_a, df_b, right_index=True, left_index=True) print(df)

So the output comes as:

  subject_id first_name last_name
0          1       Alex  Anderson
1          2        Amy  Ackerman
2          3      Allen       Ali
3          4      Alice      Aoni
4          5     Ayoung   Atiches

  subject_id first_name last_name
0          4      Billy    Bonder
1          5      Brian     Black
2          6       Bran   Balwner
3          7      Bryce     Brice
4          8      Betty    Btisan

  subject_id  test_id
0          1       51
1          2       15
2          3       15
3          4       61
4          5       16
5          7       14
6          8       15
7          9        1
8         10       61
9         11       16

  subject_id first_name last_name
0          1       Alex  Anderson
1          2        Amy  Ackerman
2          3      Allen       Ali
3          4      Alice      Aoni
4          5     Ayoung   Atiches
0          4      Billy    Bonder
1          5      Brian     Black
2          6       Bran   Balwner
3          7      Bryce     Brice
4          8      Betty    Btisan

  subject_id first_name last_name subject_id first_name last_name
0          1       Alex  Anderson          4      Billy    Bonder
1          2        Amy  Ackerman          5      Brian     Black
2          3      Allen       Ali          6       Bran   Balwner
3          4      Alice      Aoni          7      Bryce     Brice
4          5     Ayoung   Atiches          8      Betty    Btisan

  subject_id first_name last_name  test_id
0          1       Alex  Anderson       51
1          2        Amy  Ackerman       15
2          3      Allen       Ali       15
3          4      Alice      Aoni       61
4          4      Billy    Bonder       61
5          5     Ayoung   Atiches       16
6          5      Brian     Black       16
7          7      Bryce     Brice       14
8          8      Betty    Btisan       15

  subject_id first_name last_name  test_id
0          1       Alex  Anderson       51
1          2        Amy  Ackerman       15
2          3      Allen       Ali       15
3          4      Alice      Aoni       61
4          4      Billy    Bonder       61
5          5     Ayoung   Atiches       16
6          5      Brian     Black       16
7          7      Bryce     Brice       14
8          8      Betty    Btisan       15

  subject_id first_name_x last_name_x first_name_y last_name_y
0          1         Alex    Anderson          NaN         NaN
1          2          Amy    Ackerman          NaN         NaN
2          3        Allen         Ali          NaN         NaN
3          4        Alice        Aoni        Billy      Bonder
4          5       Ayoung     Atiches        Brian       Black
5          6          NaN         NaN         Bran     Balwner
6          7          NaN         NaN        Bryce       Brice
7          8          NaN         NaN        Betty      Btisan

  subject_id first_name_x last_name_x first_name_y last_name_y
0          4        Alice        Aoni        Billy      Bonder
1          5       Ayoung     Atiches        Brian       Black

  subject_id first_name_x last_name_x first_name_y last_name_y
0          4        Alice        Aoni        Billy      Bonder
1          5       Ayoung     Atiches        Brian       Black
2          6          NaN         NaN         Bran     Balwner
3          7          NaN         NaN        Bryce       Brice
4          8          NaN         NaN        Betty      Btisan

  subject_id first_name_x last_name_x first_name_y last_name_y
0          1         Alex    Anderson          NaN         NaN
1          2          Amy    Ackerman          NaN         NaN
2          3        Allen         Ali          NaN         NaN
3          4        Alice        Aoni        Billy      Bonder
4          5       Ayoung     Atiches        Brian       Black

  subject_id first_name_left last_name_left first_name_right last_name_right
0          1            Alex       Anderson              NaN             NaN
1          2             Amy       Ackerman              NaN             NaN
2          3           Allen            Ali              NaN             NaN
3          4           Alice           Aoni            Billy          Bonder
4          5          Ayoung        Atiches            Brian           Black

  subject_id_x first_name_x last_name_x subject_id_y first_name_y last_name_y
0            1         Alex    Anderson            4        Billy      Bonder
1            2          Amy    Ackerman            5        Brian       Black
2            3        Allen         Ali            6         Bran     Balwner
3            4        Alice        Aoni            7        Bryce       Brice
4            5       Ayoung     Atiches            8        Betty      Btisan

Download Materials

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Isolation Forest Model and LOF for Anomaly Detection in Python
Credit Card Fraud Detection Project - Build an Isolation Forest Model and Local Outlier Factor (LOF) in Python to identify fraudulent credit card transactions.

AWS Project to Build and Deploy LSTM Model with Sagemaker
In this AWS Sagemaker Project, you will learn to build a LSTM model on Sagemaker for sales forecasting while analyzing the impact of weather conditions on Sales.

Build ARCH and GARCH Models in Time Series using Python
In this Project we will build an ARCH and a GARCH model using Python

BigMart Sales Prediction ML Project in Python
The goal of the BigMart Sales Prediction ML project is to build and evaluate different predictive models and determine the sales of each product at a store.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

BERT Text Classification using DistilBERT and ALBERT Models
This Project Explains how to perform Text Classification using ALBERT and DistilBERT

MLOps Project to Deploy Resume Parser Model on Paperspace
In this MLOps project, you will learn how to deploy a Resume Parser Streamlit Application on Paperspace Private Cloud.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

End-to-End ML Model Monitoring using Airflow and Docker
In this MLOps Project, you will learn to build an end to end pipeline to monitor any changes in the predictive power of model or degradation of data.

Text Classification with Transformers-RoBERTa and XLNet Model
In this machine learning project, you will learn how to load, fine tune and evaluate various transformer models for text classification tasks.