How to JOIN and MERGE Pandas DataFrame?

This recipe helps you JOIN and MERGE Pandas DataFrame

Recipe Objective

Have you ever felt a need of joining or merging of dataset? How did you do it?

So this is the recipe on how we can join and merge Pandas DataFrame.

Master the Art of Data Cleaning in Machine Learning

Step 1 - Import the library

import pandas as pd

We have imported numpy and pandas which will be needed for the dataset.

Step 2 - Setting up multiple Data Sets

We have created multiple dataframes that we will join and merge. raw_data = {"subject_id": ["1", "2", "3", "4", "5"], "first_name": ["Alex", "Amy", "Allen", "Alice", "Ayoung"], "last_name": ["Anderson", "Ackerman", "Ali", "Aoni", "Atiches"]} df_a = pd.DataFrame(raw_data, columns = ["subject_id", "first_name", "last_name"]) print(); print(df_a) raw_data = {"subject_id": ["4", "5", "6", "7", "8"], "first_name": ["Billy", "Brian", "Bran", "Bryce", "Betty"], "last_name": ["Bonder", "Black", "Balwner", "Brice", "Btisan"]} df_b = pd.DataFrame(raw_data, columns = ["subject_id", "first_name", "last_name"]) print(); print(df_b) raw_data = {"subject_id": ["1", "2", "3", "4", "5", "7", "8", "9", "10", "11"], "test_id": [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]} df_n = pd.DataFrame(raw_data, columns = ["subject_id","test_id"]) print(); print(df_n)

Step 3 - Joining and Merging DataFrames

Here we will be joining and merging dataframes in different ways. We will use pd.concat to join the dataset and pd.merge to merge the dataset.

    • Joining the two dataframes along rows

df_new = pd.concat([df_a, df_b]) print(df_new)

    • Droping rows where all cells in that row is NA

df_cleaned = df.dropna(how="all") print(df_cleaned)

    • Joining the two dataframes along columns

df = pd.concat([df_a, df_b], axis=1) print(df)

    • Merge two dataframes along the subject_id value

df = pd.merge(df_new, df_n, on="subject_id") print(df)

    • Merge two dataframes with both the left and right dataframes using the subject_id key

df = pd.merge(df_new, df_n, left_on="subject_id", right_on="subject_id") print(df)

    • Merge with outer join

df = pd.merge(df_a, df_b, on="subject_id", how="outer") print(df)

    • Merge with inner join

df = pd.merge(df_a, df_b, on="subject_id", how="inner") print(df)

    • Merge with right join

df = pd.merge(df_a, df_b, on="subject_id", how="right") print(df)

    • Merge with left join

df = pd.merge(df_a, df_b, on="subject_id", how="left") print(df)

    • Merge while adding a suffix to duplicate column names

df = pd.merge(df_a, df_b, on="subject_id", how="left", suffixes=("_left", "_right")) print(df)

    • Merge based on indexes

df = pd.merge(df_a, df_b, right_index=True, left_index=True) print(df)

So the output comes as:

  subject_id first_name last_name
0          1       Alex  Anderson
1          2        Amy  Ackerman
2          3      Allen       Ali
3          4      Alice      Aoni
4          5     Ayoung   Atiches

  subject_id first_name last_name
0          4      Billy    Bonder
1          5      Brian     Black
2          6       Bran   Balwner
3          7      Bryce     Brice
4          8      Betty    Btisan

  subject_id  test_id
0          1       51
1          2       15
2          3       15
3          4       61
4          5       16
5          7       14
6          8       15
7          9        1
8         10       61
9         11       16

  subject_id first_name last_name
0          1       Alex  Anderson
1          2        Amy  Ackerman
2          3      Allen       Ali
3          4      Alice      Aoni
4          5     Ayoung   Atiches
0          4      Billy    Bonder
1          5      Brian     Black
2          6       Bran   Balwner
3          7      Bryce     Brice
4          8      Betty    Btisan

  subject_id first_name last_name subject_id first_name last_name
0          1       Alex  Anderson          4      Billy    Bonder
1          2        Amy  Ackerman          5      Brian     Black
2          3      Allen       Ali          6       Bran   Balwner
3          4      Alice      Aoni          7      Bryce     Brice
4          5     Ayoung   Atiches          8      Betty    Btisan

  subject_id first_name last_name  test_id
0          1       Alex  Anderson       51
1          2        Amy  Ackerman       15
2          3      Allen       Ali       15
3          4      Alice      Aoni       61
4          4      Billy    Bonder       61
5          5     Ayoung   Atiches       16
6          5      Brian     Black       16
7          7      Bryce     Brice       14
8          8      Betty    Btisan       15

  subject_id first_name last_name  test_id
0          1       Alex  Anderson       51
1          2        Amy  Ackerman       15
2          3      Allen       Ali       15
3          4      Alice      Aoni       61
4          4      Billy    Bonder       61
5          5     Ayoung   Atiches       16
6          5      Brian     Black       16
7          7      Bryce     Brice       14
8          8      Betty    Btisan       15

  subject_id first_name_x last_name_x first_name_y last_name_y
0          1         Alex    Anderson          NaN         NaN
1          2          Amy    Ackerman          NaN         NaN
2          3        Allen         Ali          NaN         NaN
3          4        Alice        Aoni        Billy      Bonder
4          5       Ayoung     Atiches        Brian       Black
5          6          NaN         NaN         Bran     Balwner
6          7          NaN         NaN        Bryce       Brice
7          8          NaN         NaN        Betty      Btisan

  subject_id first_name_x last_name_x first_name_y last_name_y
0          4        Alice        Aoni        Billy      Bonder
1          5       Ayoung     Atiches        Brian       Black

  subject_id first_name_x last_name_x first_name_y last_name_y
0          4        Alice        Aoni        Billy      Bonder
1          5       Ayoung     Atiches        Brian       Black
2          6          NaN         NaN         Bran     Balwner
3          7          NaN         NaN        Bryce       Brice
4          8          NaN         NaN        Betty      Btisan

  subject_id first_name_x last_name_x first_name_y last_name_y
0          1         Alex    Anderson          NaN         NaN
1          2          Amy    Ackerman          NaN         NaN
2          3        Allen         Ali          NaN         NaN
3          4        Alice        Aoni        Billy      Bonder
4          5       Ayoung     Atiches        Brian       Black

  subject_id first_name_left last_name_left first_name_right last_name_right
0          1            Alex       Anderson              NaN             NaN
1          2             Amy       Ackerman              NaN             NaN
2          3           Allen            Ali              NaN             NaN
3          4           Alice           Aoni            Billy          Bonder
4          5          Ayoung        Atiches            Brian           Black

  subject_id_x first_name_x last_name_x subject_id_y first_name_y last_name_y
0            1         Alex    Anderson            4        Billy      Bonder
1            2          Amy    Ackerman            5        Brian       Black
2            3        Allen         Ali            6         Bran     Balwner
3            4        Alice        Aoni            7        Bryce       Brice
4            5       Ayoung     Atiches            8        Betty      Btisan

Download Materials

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Abstractive Text Summarization using Transformers-BART Model
Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.

MLOps Project for a Mask R-CNN on GCP using uWSGI Flask
MLOps on GCP - Solved end-to-end MLOps Project to deploy a Mask RCNN Model for Image Segmentation as a Web Application using uWSGI Flask, Docker, and TensorFlow.

Build a Customer Churn Prediction Model using Decision Trees
Develop a customer churn prediction model using decision tree machine learning algorithms and data science on streaming service data.

Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

Many-to-One LSTM for Sentiment Analysis and Text Generation
In this LSTM Project , you will build develop a sentiment detection model using many-to-one LSTMs for accurate prediction of sentiment labels in airline text reviews. Additionally, we will also train many-to-one LSTMs on 'Alice's Adventures in Wonderland' to generate contextually relevant text.

MLOps AWS Project on Topic Modeling using Gunicorn Flask
In this project we will see the end-to-end machine learning development process to design, build and manage reproducible, testable, and evolvable machine learning models by using AWS

End-to-End Speech Emotion Recognition Project using ANN
Speech Emotion Recognition using RAVDESS Audio Dataset - Build an Artificial Neural Network Model to Classify Audio Data into various Emotions like Sad, Happy, Angry, and Neutral

NLP Project to Build a Resume Parser in Python using Spacy
Use the popular Spacy NLP python library for OCR and text classification to build a Resume Parser in Python.

Build a Multi-Class Classification Model in Python on Saturn Cloud
In this machine learning classification project, you will build a multi-class classification model in Python on Saturn Cloud to predict the license status of a business.

Build Portfolio Optimization Machine Learning Models in R
Machine Learning Project for Financial Risk Modelling and Portfolio Optimization with R- Build a machine learning model in R to develop a strategy for building a portfolio for maximized returns.