What is the difference between merge and dplyr join in R and Which is faster?

This recipe explains what is the difference between merge and dplyr join in R and Which is faster

Recipe Objective

In formation of a database, all the data cannot be stored in one table to avoid duplicacy. To extract specific information out of the database, we merge two or more tables using a common field. The merging takes place w.r.t the concept of joins. There are 5 types of joins: ​

  1. Inner join: Returns only matching records
  2. Outer join: Returns all records including no matches in both directions
  3. Left Join: Returns all records in left dataframe and only matching records from the other
  4. Right Join : Returns all records in right dataframe and only matching records from the other
  5. Cross join: Returns all the possible combination of records in both the dataframes

To merge 2 dataframes in R, we can use merge() function as well as the dplyr joins. For large tables dplyr join functions is much faster than merge(). The advantages of using dplyr package for merging dataframes are:

  1. They are much faster.
  2. Informs you about the keys you're merging by.
  3. They are flexible and work with database tables.

In this recipe, we will learn how to merge two dataframe in R using dplyr package.

Explore the BERT Variants - ALBERT vs DistilBERT

Step 1: Loading the required library and Creating 2 DataFrames

Taking an example of university database with 1 table to be of personal details and the other as courses enrolled ​

#Data manipulation library(dplyr) # Dataframe 1: personal_details = data.frame(Student_ID = c(1:5), Name = c("Siddhi", "Jessica", "Nisarg", "Vishal", "Fredo"), Address = c("Mumbai", "Mumbai", "Pune", "Madgaon", "Nashik")) #Dataframe 2: courses = data.frame(Student_ID = c(2,3,5,8), Course = c("Chemistry", "Physics", "Computer Science", "History")) print(personal_details) print(courses)

Student_ID    Name Address
1          1  Siddhi  Mumbai
2          2 Jessica  Mumbai
3          3  Nisarg    Pune
4          4  Vishal Madgaon
5          5   Fredo  Nashik
  Student_ID           Course
1          2        Chemistry
2          3          Physics
3          5 Computer Science
4          8          History

Step 2: Merging the two dataframes

We use the common field "Student_ID" for merging the data using left_join() function for left join. These fuctions follows the concept of SQL Joins to merge the dataframes. ​

Syntax: left_join(x , y, by = ) ​

where: ​

  1. x = dataframe 1
  2. y = dataframe 2
  3. by = common field by which the merging takes place

Similarly, Right join is carried out by right_join; Inner Join is carried out by inner_join(); Full Join or union by full_join(). ​

1. Inner Join ​

inner = inner_join(x=personal_details,y=courses,by="Student_ID") print(inner)

Student_ID    Name Address           Course
1          2 Jessica  Mumbai        Chemistry
2          3  Nisarg    Pune          Physics
3          5   Fredo  Nashik Computer Science

2. Full Join ​

full = full_join(x=personal_details,y=courses,by="Student_ID") print(full)

Student_ID    Name Address           Course
1          1  Siddhi  Mumbai             
2          2 Jessica  Mumbai        Chemistry
3          3  Nisarg    Pune          Physics
4          4  Vishal Madgaon             
5          5   Fredo  Nashik Computer Science
6          8                  History

3. Left Join ​

left = left_join(x=personal_details,y=courses,by="Student_ID") print(left)

Student_ID    Name Address           Course
1          1  Siddhi  Mumbai             
2          2 Jessica  Mumbai        Chemistry
3          3  Nisarg    Pune          Physics
4          4  Vishal Madgaon             
5          5   Fredo  Nashik Computer Science

4. Right Join ​

right = right_join(x=personal_details,y=courses,by="Student_ID") print(right)

Student_ID    Name Address           Course
1          2 Jessica  Mumbai        Chemistry
2          3  Nisarg    Pune          Physics
3          5   Fredo  Nashik Computer Science
4          8                  History

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Langchain Project for Customer Support App in Python
In this LLM Project, you will learn how to enhance customer support interactions through Large Language Models (LLMs), enabling intelligent, context-aware responses. This Langchain project aims to seamlessly integrate LLM technology with databases, PDF knowledge bases, and audio processing agents to create a comprehensive customer support application.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Digit Recognition using CNN for MNIST Dataset in Python
In this deep learning project, you will build a convolutional neural network using MNIST dataset for handwritten digit recognition.

Hands-On Approach to Master PyTorch Tensors with Examples
In this deep learning project, you will learn how to perform various operations on the building block of PyTorch : Tensors.

Deploy Transformer-BART Model on Paperspace Cloud
In this MLOps Project you will learn how to deploy a Tranaformer BART Model for Abstractive Text Summarization on Paperspace Private Cloud

Build CI/CD Pipeline for Machine Learning Projects using Jenkins
In this project, you will learn how to create a CI/CD pipeline for a search engine application using Jenkins.

Build OCR from Scratch Python using YOLO and Tesseract
In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

Expedia Hotel Recommendations Data Science Project
In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

PyTorch Project to Build a LSTM Text Classification Model
In this PyTorch Project you will learn how to build an LSTM Text Classification model for Classifying the Reviews of an App .

Recommender System Machine Learning Project for Beginners-4
Collaborative Filtering Recommender System Project - Comparison of different model based and memory based methods to build recommendation system using collaborative filtering.