What is the difference between merge and dplyr join in R and Which is faster?

This recipe explains what is the difference between merge and dplyr join in R and Which is faster

Recipe Objective

In formation of a database, all the data cannot be stored in one table to avoid duplicacy. To extract specific information out of the database, we merge two or more tables using a common field. The merging takes place w.r.t the concept of joins. There are 5 types of joins: ​

  1. Inner join: Returns only matching records
  2. Outer join: Returns all records including no matches in both directions
  3. Left Join: Returns all records in left dataframe and only matching records from the other
  4. Right Join : Returns all records in right dataframe and only matching records from the other
  5. Cross join: Returns all the possible combination of records in both the dataframes

To merge 2 dataframes in R, we can use merge() function as well as the dplyr joins. For large tables dplyr join functions is much faster than merge(). The advantages of using dplyr package for merging dataframes are:

  1. They are much faster.
  2. Informs you about the keys you're merging by.
  3. They are flexible and work with database tables.

In this recipe, we will learn how to merge two dataframe in R using dplyr package.

Explore the BERT Variants - ALBERT vs DistilBERT

Step 1: Loading the required library and Creating 2 DataFrames

Taking an example of university database with 1 table to be of personal details and the other as courses enrolled ​

#Data manipulation library(dplyr) # Dataframe 1: personal_details = data.frame(Student_ID = c(1:5), Name = c("Siddhi", "Jessica", "Nisarg", "Vishal", "Fredo"), Address = c("Mumbai", "Mumbai", "Pune", "Madgaon", "Nashik")) #Dataframe 2: courses = data.frame(Student_ID = c(2,3,5,8), Course = c("Chemistry", "Physics", "Computer Science", "History")) print(personal_details) print(courses)

Student_ID    Name Address
1          1  Siddhi  Mumbai
2          2 Jessica  Mumbai
3          3  Nisarg    Pune
4          4  Vishal Madgaon
5          5   Fredo  Nashik
  Student_ID           Course
1          2        Chemistry
2          3          Physics
3          5 Computer Science
4          8          History

Step 2: Merging the two dataframes

We use the common field "Student_ID" for merging the data using left_join() function for left join. These fuctions follows the concept of SQL Joins to merge the dataframes. ​

Syntax: left_join(x , y, by = ) ​

where: ​

  1. x = dataframe 1
  2. y = dataframe 2
  3. by = common field by which the merging takes place

Similarly, Right join is carried out by right_join; Inner Join is carried out by inner_join(); Full Join or union by full_join(). ​

1. Inner Join ​

inner = inner_join(x=personal_details,y=courses,by="Student_ID") print(inner)

Student_ID    Name Address           Course
1          2 Jessica  Mumbai        Chemistry
2          3  Nisarg    Pune          Physics
3          5   Fredo  Nashik Computer Science

2. Full Join ​

full = full_join(x=personal_details,y=courses,by="Student_ID") print(full)

Student_ID    Name Address           Course
1          1  Siddhi  Mumbai             
2          2 Jessica  Mumbai        Chemistry
3          3  Nisarg    Pune          Physics
4          4  Vishal Madgaon             
5          5   Fredo  Nashik Computer Science
6          8                  History

3. Left Join ​

left = left_join(x=personal_details,y=courses,by="Student_ID") print(left)

Student_ID    Name Address           Course
1          1  Siddhi  Mumbai             
2          2 Jessica  Mumbai        Chemistry
3          3  Nisarg    Pune          Physics
4          4  Vishal Madgaon             
5          5   Fredo  Nashik Computer Science

4. Right Join ​

right = right_join(x=personal_details,y=courses,by="Student_ID") print(right)

Student_ID    Name Address           Course
1          2 Jessica  Mumbai        Chemistry
2          3  Nisarg    Pune          Physics
3          5   Fredo  Nashik Computer Science
4          8                  History

What Users are saying..

profile image

Jingwei Li

Graduate Research assistance at Stony Brook University
linkedin profile url

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Build Multi Class Text Classification Models with RNN and LSTM
In this Deep Learning Project, you will use the customer complaints data about consumer financial products to build multi-class text classification models using RNN and LSTM.

Learn Object Tracking (SOT, MOT) using OpenCV and Python
Get Started with Object Tracking using OpenCV and Python - Learn to implement Multiple Instance Learning Tracker (MIL) algorithm, Generic Object Tracking Using Regression Networks Tracker (GOTURN) algorithm, Kernelized Correlation Filters Tracker (KCF) algorithm, Tracking, Learning, Detection Tracker (TLD) algorithm for single and multiple object tracking from various video clips.

PyTorch Project to Build a LSTM Text Classification Model
In this PyTorch Project you will learn how to build an LSTM Text Classification model for Classifying the Reviews of an App .

Hands-On Approach to Causal Inference in Machine Learning
In this Machine Learning Project, you will learn to implement various causal inference techniques in Python to determine, how effective the sprinkler is in making the grass wet.

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

LLM Project to Build and Fine Tune a Large Language Model
In this LLM project for beginners, you will learn to build a knowledge-grounded chatbot using LLM's and learn how to fine tune it.

Multilabel Classification Project for Predicting Shipment Modes
Multilabel Classification Project to build a machine learning model that predicts the appropriate mode of transport for each shipment, using a transport dataset with 2000 unique products. The project explores and compares four different approaches to multilabel classification, including naive independent models, classifier chains, natively multilabel models, and multilabel to multiclass approaches.

Linear Regression Model Project in Python for Beginners Part 1
Machine Learning Linear Regression Project in Python to build a simple linear regression model and master the fundamentals of regression for beginners.

Classification Projects on Machine Learning for Beginners - 2
Learn to implement various ensemble techniques to predict license status for a given business.

Build a Text Classification Model with Attention Mechanism NLP
In this NLP Project, you will learn to build a multi class text classification model with attention mechanism.