What is the difference between merge and dplyr join and Which is faster?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

What is the difference between merge and dplyr join and Which is faster?

What is the difference between merge and dplyr join and Which is faster?

This recipe explains what is the difference between merge and dplyr join and Which is faster

0

Recipe Objective

In formation of a database, all the data cannot be stored in one table to avoid duplicacy. To extract specific information out of the database, we merge two or more tables using a common field. The merging takes place w.r.t the concept of joins. There are 5 types of joins: ​

  1. Inner join: Returns only matching records
  2. Outer join: Returns all records including no matches in both directions
  3. Left Join: Returns all records in left dataframe and only matching records from the other
  4. Right Join : Returns all records in right dataframe and only matching records from the other
  5. Cross join: Returns all the possible combination of records in both the dataframes

To merge 2 dataframes in R, we can use merge() function as well as the dplyr joins. For large tables dplyr join functions is much faster than merge(). The advantages of using dplyr package for merging dataframes are:

  1. They are much faster.
  2. Informs you about the keys you're merging by.
  3. They are flexible and work with database tables.

In this recipe, we will learn how to merge two dataframe in R using dplyr package.

Step 1: Loading the required library and Creating 2 DataFrames

Taking an example of university database with 1 table to be of personal details and the other as courses enrolled ​

#Data manipulation library(dplyr) # Dataframe 1: personal_details = data.frame(Student_ID = c(1:5), Name = c("Siddhi", "Jessica", "Nisarg", "Vishal", "Fredo"), Address = c("Mumbai", "Mumbai", "Pune", "Madgaon", "Nashik")) #Dataframe 2: courses = data.frame(Student_ID = c(2,3,5,8), Course = c("Chemistry", "Physics", "Computer Science", "History")) print(personal_details) print(courses)
Student_ID    Name Address
1          1  Siddhi  Mumbai
2          2 Jessica  Mumbai
3          3  Nisarg    Pune
4          4  Vishal Madgaon
5          5   Fredo  Nashik
  Student_ID           Course
1          2        Chemistry
2          3          Physics
3          5 Computer Science
4          8          History

Step 2: Merging the two dataframes

We use the common field "Student_ID" for merging the data using left_join() function for left join. These fuctions follows the concept of SQL Joins to merge the dataframes. ​

Syntax: left_join(x , y, by = ) ​

where: ​

  1. x = dataframe 1
  2. y = dataframe 2
  3. by = common field by which the merging takes place

Similarly, Right join is carried out by right_join; Inner Join is carried out by inner_join(); Full Join or union by full_join(). ​

1. Inner Join ​

inner = inner_join(x=personal_details,y=courses,by="Student_ID") print(inner)
Student_ID    Name Address           Course
1          2 Jessica  Mumbai        Chemistry
2          3  Nisarg    Pune          Physics
3          5   Fredo  Nashik Computer Science

2. Full Join ​

full = full_join(x=personal_details,y=courses,by="Student_ID") print(full)
Student_ID    Name Address           Course
1          1  Siddhi  Mumbai             
2          2 Jessica  Mumbai        Chemistry
3          3  Nisarg    Pune          Physics
4          4  Vishal Madgaon             
5          5   Fredo  Nashik Computer Science
6          8                  History

3. Left Join ​

left = left_join(x=personal_details,y=courses,by="Student_ID") print(left)
Student_ID    Name Address           Course
1          1  Siddhi  Mumbai             
2          2 Jessica  Mumbai        Chemistry
3          3  Nisarg    Pune          Physics
4          4  Vishal Madgaon             
5          5   Fredo  Nashik Computer Science

4. Right Join ​

right = right_join(x=personal_details,y=courses,by="Student_ID") print(right)
Student_ID    Name Address           Course
1          2 Jessica  Mumbai        Chemistry
2          3  Nisarg    Pune          Physics
3          5   Fredo  Nashik Computer Science
4          8                  History

Relevant Projects

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.