How to do group by in R using dplyr?

This recipe helps you do group by in R using dplyr

Recipe Objective

Aggregation is one of the fundamental techniques in data manipulation that a data scientist should know. In R, we have dplyr package which is an add-on package most widely used to carry out data manipulation tasks. To carry out the task of aggregation, dplyr package provides us with group_by() function. ​

The group_by() function groups multiple rows of the dataframe based on a categorical column. When combined with summarise() function, it gives us a way to calculate mean, sum, count, minimium or maximum using in-built functions for the specified variables.

There are two ways in which we can use group_by() function :

  1. Using dplyr pipe operator (%>%)
  2. Using summarise_at()

In this recipe, we will learn how to use group_by() fuction by dplyr package in R. ​

Step 1: Loading the required library and Creating a DataFrame

Creating a STUDENT dataframe with Name and marks of two subjects in 3 Trimester exams. ​

# data manipulation library(dplyr) library(tidyverse) STUDENT = data.frame(Name = c("Ram","Ram", "Ram", "Shyam", "Shyam", "Shyam", "Jessica", "Jessica", "Jessica"), Science_Marks = c(55, 60, 65, 80, 70, 75, 45, 65, 70), Math_Marks = c(70, 75, 73, 50, 53, 55, 65, 78, 75), Trimester = c(1, 2, 3, 1, 2, 3, 1, 2, 3)) glimpse(STUDENT)
Rows: 9
Columns: 4
$ Name           Ram, Ram, Ram, Shyam, Shyam, Shyam, Jessica, Jessica,...
$ Science_Marks  55, 60, 65, 80, 70, 75, 45, 65, 70
$ Math_Marks     70, 75, 73, 50, 53, 55, 65, 78, 75
$ Trimester      1, 2, 3, 1, 2, 3, 1, 2, 3

Step 2: Application of group_by Function

Syntax: group_by(x, ...) ​

where: ​

  1. x = dataframe
  2. ... = variables by which grouping needs to take place
# to check the variois arguements of the function ?group_by()

Query 1: To find the average marks for each student in a year (Trimester 1, 2 and 3) ​

Approach 1: Using pipe operator (%>%) ​

# first grouping the columns by student names and then carrying out summarise function on it STUDENT %>% group_by(Name) %>% summarise_at(vars(c(Science_Marks, Math_Marks)), funs(mean(.)))
Name	Science_Marks	Math_Marks
Jessica	60		72.66667
Ram	60		72.66667
Shyam	75		52.66667

Approach 2: Using summarise_at() ​

summarise_at(group_by(STUDENT,Name), vars(c(Science_Marks, Math_Marks)), funs(mean(.)))
Name	Science_Marks	Math_Marks
Jessica	60		72.66667
Ram	60		72.66667
Shyam	75		52.66667

What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Credit Card Default Prediction using Machine learning techniques
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

PyTorch Project to Build a GAN Model on MNIST Dataset
In this deep learning project, you will learn how to build a GAN Model on MNIST Dataset for generating new images of handwritten digits.

Hands-On Approach to Regression Discontinuity Design Python
In this machine learning project, you will learn to implement Regression Discontinuity Design Example in Python to determine the effect of age on Mortality Rate in Python.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Census Income Data Set Project-Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based oncensus data.

NLP Project for Multi Class Text Classification using BERT Model
In this NLP Project, you will learn how to build a multi-class text classification model using using the pre-trained BERT model.

Build a Text Classification Model with Attention Mechanism NLP
In this NLP Project, you will learn to build a multi class text classification model with attention mechanism.

Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

Word2Vec and FastText Word Embedding with Gensim in Python
In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

Learn How to Build PyTorch Neural Networks from Scratch
In this deep learning project, you will learn how to build PyTorch neural networks from scratch.