How to merge 2 dataframes in R?

This recipe helps you merge 2 dataframes in R

Recipe Objective

In formation of a database, all the data cannot be stored in one table to avoid duplicacy. To extract specific information out of the database, we merge two or more tables using a common field. The merging takes place w.r.t the concept of joins. There are 5 types of joins: ​

  1. Inner join: Returns only matching records
  2. Outer join: Returns all records including no matches in both directions
  3. Left Join: Returns all records in left dataframe and only matching records from the other
  4. Right Join : Returns all records in right dataframe and only matching records from the other
  5. Cross join: Returns all the possible combination of records in both the dataframes

To merge 2 dataframes in R, we use merge() function to do so. In this recipe, we will learn how to merge two dataframe in R using an example.

Get Access to Plant Species Identification Project using Machine Learning

Step 1: Loading required library and creating 2 dataframes.

Taking an example of university database with 1 table to be of personal details and the other as courses enrolled

#Data manipulation library(tidyverse) # Dataframe 1: personal_details = data.frame(Student_ID = c(1:5), Name = c("Siddhi", "Jessica", "Nisarg", "Vishal", "Fredo"), Address = c("Mumbai", "Mumbai", "Pune", "Madgaon", "Nashik")) #Dataframe 2: courses = data.frame(Student_ID = c(2,3,5,8), Course = c("Chemistry", "Physics", "Computer Science", "History")) print(personal_details) print(courses)

  Student_ID    Name Address
1          1  Siddhi  Mumbai
2          2 Jessica  Mumbai
3          3  Nisarg    Pune
4          4  Vishal Madgaon
5          5   Fredo  Nashik
  Student_ID           Course
1          2        Chemistry
2          3          Physics
3          5 Computer Science
4          8          History

Merging the two dataframes

We use the common field "Student_ID" for merging the data using merge() function. ​

Syntax: merge(x , y, by = , all.x = , all.y = , all = ) ​

where: ​

  1. x = dataframe 1
  2. y = dataframe 2
  3. by = common field by which the merging takes place
  4. all, all.x, all.y = logical values which indicates the type of merge. The default value of all is FALSE(i.e. it returns only the matching rows)

1. Inner Join ​

inner = merge(x=personal_details,y=courses,by="Student_ID") inner

Student_ID    Name Address           Course
1          2 Jessica  Mumbai        Chemistry
2          3  Nisarg    Pune          Physics
3          5   Fredo  Nashik Computer Science

2. Outer Join: (condition: all = TRUE) ​

outer = merge(x=personal_details,y=courses,by="Student_ID", all = TRUE) outer

Student_ID    Name Address           Course
1          1  Siddhi  Mumbai             
2          2 Jessica  Mumbai        Chemistry
3          3  Nisarg    Pune          Physics
4          4  Vishal Madgaon             
5          5   Fredo  Nashik Computer Science
6          8                  History

3. Left Join: (Condition: all.x = TRUE) ​

left = merge(x=personal_details,y=courses,by="Student_ID", all.x = TRUE) left

Student_ID    Name Address           Course
1          1  Siddhi  Mumbai             
2          2 Jessica  Mumbai        Chemistry
3          3  Nisarg    Pune          Physics
4          4  Vishal Madgaon             
5          5   Fredo  Nashik Computer Science

4. Right Join: (Condition: all.y = TRUE) ​

right = merge(x=personal_details,y=courses,by="Student_ID", all.y = TRUE) right

Student_ID    Name Address           Course
1          2 Jessica  Mumbai        Chemistry
2          3  Nisarg    Pune          Physics
3          5   Fredo  Nashik Computer Science
4          8                  History

What Users are saying..

profile image

Ed Godalle

Director Data Analytics at EY / EY Tech
linkedin profile url

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

LLM Project to Build and Fine Tune a Large Language Model
In this LLM project for beginners, you will learn to build a knowledge-grounded chatbot using LLM's and learn how to fine tune it.

Build a Hybrid Recommender System in Python using LightFM
In this Recommender System project, you will build a hybrid recommender system in Python using LightFM .

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Deep Learning Project for Beginners with Source Code Part 1
Learn to implement deep neural networks in Python .

Build a Text Generator Model using Amazon SageMaker
In this Deep Learning Project, you will train a Text Generator Model on Amazon Reviews Dataset using LSTM Algorithm in PyTorch and deploy it on Amazon SageMaker.

Skip Gram Model Python Implementation for Word Embeddings
Skip-Gram Model word2vec Example -Learn how to implement the skip gram algorithm in NLP for word embeddings on a set of documents.

GCP MLOps Project to Deploy ARIMA Model using uWSGI Flask
Build an end-to-end MLOps Pipeline to deploy a Time Series ARIMA Model on GCP using uWSGI and Flask

Hands-On Approach to Causal Inference in Machine Learning
In this Machine Learning Project, you will learn to implement various causal inference techniques in Python to determine, how effective the sprinkler is in making the grass wet.

Recommender System Machine Learning Project for Beginners-4
Collaborative Filtering Recommender System Project - Comparison of different model based and memory based methods to build recommendation system using collaborative filtering.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.