How to join Excel data from Multiple files using Pandas?

This recipe helps you join Excel data from Multiple files using Pandas.

Recipe Objective: How to join Excel data from Multiple files using Pandas?

In most big data scenarios, we must merge multiple files or tables based on the various conditions to create a unified data model for quicker data analysis. In this recipe, we will learn how to merge multiple Excel files into one using Python.

Master the Art of Data Cleaning in Machine Learning

System Requirements To Perform Merge Two Excel Files in Python Pandas

Steps To Perform Python Merge Excel Files Into One

The following steps will show how to perform Python Pandas merge excel files into one.

Step 1: Import the modules

In this example we are going to use the pandas library , this library is used for data manipulation pandas data structures and operations for manipulating numerical tables and time series

Import pandas as pd

Step 2: Read the Excel Files

In the below code, we will read the data from Excel files and create dataframes using the Pandas library.

orders = pd. read_excel('orders.xlsx') products =pd.read_excel("products.xlsx") customers = pd.read_excel("customers.xlsx")

Step 3: Join operations on the Data frames

Using the "how" parameter in the merge function, we will perform the join operations like left, right,..etc.

Left Join:

import pandas as pd orders = pd. read_excel('orders.xlsx') products =pd.read_excel("products.xlsx") customers = pd.read_excel("customers.xlsx") result = pd.merge(orders,customers[["Product_id","Order_id","customer_name",'customer_email']],on='Product_id', how='left') result.head()

Output of the above code:

Join operations on the Data frames

 

Inner Join:

import pandas as pd orders = pd. read_excel('orders.xlsx') products =pd.read_excel("products.xlsx") customers = pd.read_excel("customers.xlsx") result= pd.merge(products,customers,on='Product_id',how='inner',indicator=True) result.head()

Output of the above code:

Inner Join

 

Right Join:

import pandas as pd orders = pd. read_excel('orders.xlsx') products =pd.read_excel("products.xlsx") customers = pd.read_excel("customers.xlsx") result = pd.merge(orders, customers[["Product_id","Order_id","customer_name",'customer_email']], on='Product_id', how='right', indicator=True) result.head()

Output of the above code:

Right Join

Outer Join:

import pandas as pd orders = pd. read_excel('orders.xlsx') products =pd.read_excel("products.xlsx") customers = pd.read_excel("customers.xlsx") result= pd.merge(products,customers,on='Product_id',how='outer',indicator=True) result.head()

 

Output of the above code:

Outer Join

Step 4: Write result to the CSV file

After getting the result, write to the hdfs or local file-

import pandas as pd orders = pd. read_excel('orders.xlsx') products =pd.read_excel("products.xlsx") customers = pd.read_excel("customers.xlsx") result = pd.merge(orders, customers[["Product_id","Order_id","customer_name",'customer_email']], on='Product_id') result.head() # write the results to the hdfs/ local result.to_excel("Results.xlsx", index = False)

Output of the above code will be an Excel file which will be written to current location of execution of the code and it looks like below-

Write result to the CSV file

 

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Detectron2 Object Detection and Segmentation Example Python
Object Detection using Detectron2 - Build a Dectectron2 model to detect the zones and inhibitions in antibiogram images.

End-to-End Speech Emotion Recognition Project using ANN
Speech Emotion Recognition using RAVDESS Audio Dataset - Build an Artificial Neural Network Model to Classify Audio Data into various Emotions like Sad, Happy, Angry, and Neutral

Learn Hyperparameter Tuning for Neural Networks with PyTorch
In this Deep Learning Project, you will learn how to optimally tune the hyperparameters (learning rate, epochs, dropout, early stopping) of a neural network model in PyTorch to improve model performance.

Build a Text Generator Model using Amazon SageMaker
In this Deep Learning Project, you will train a Text Generator Model on Amazon Reviews Dataset using LSTM Algorithm in PyTorch and deploy it on Amazon SageMaker.

ML Model Deployment on AWS for Customer Churn Prediction
MLOps Project-Deploy Machine Learning Model to Production Python on AWS for Customer Churn Prediction

Mastering A/B Testing: A Practical Guide for Production
In this A/B Testing for Machine Learning Project, you will gain hands-on experience in conducting A/B tests, analyzing statistical significance, and understanding the challenges of building a solution for A/B testing in a production environment.

Build an End-to-End AWS SageMaker Classification Model
MLOps on AWS SageMaker -Learn to Build an End-to-End Classification Model on SageMaker to predict a patient’s cause of death.

Learn to Build an End-to-End Machine Learning Pipeline - Part 2
In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, incorporating Hopsworks' feature store and Weights and Biases for model experimentation.

Create Your First Chatbot with RASA NLU Model and Python
Learn the basic aspects of chatbot development and open source conversational AI RASA to create a simple AI powered chatbot on your own.