How to aggregate multiple columns in a dataframe in R?

This recipe helps you aggregate multiple columns in a dataframe in R

Recipe Objective

Aggregate function is used in similar places where to apply function is applied. It calculates the summary statistics after collating raw data with respect to a grouping variable in a dataset. ​

It is a two step process. Firstly, it groups the raw data based on a categorical variable and then perform the required calculation on each groups formed.

There are three things which is required to perform aggregation: Data, grouping variable and function/calculation to perform.

Syntax: aggregate (x, by = , FUN = )

Where:

  1. x = dataframe
  2. by = Grouping variable/column in the form of list input
  3. FUN = built-in or derived function that needs to be performed on multiple columns after aggregation.

In this recipe, we will demonstrate how to aggregate multiple columns in a dataframe in R. ​

Step 1: Creating a DataFrame

Creating a STUDENT dataframe with Name and marks of two subjects in 3 Trimester exams. ​

STUDENT = data.frame(Name = c("Ram","Ram", "Ram", "Shyam", "Shyam", "Shyam", "Jessica", "Jessica", "Jessica"), Science_Marks = c(55, 60, 65, 80, 70, 75, 45, 65, 70), Math_Marks = c(70, 75, 73, 50, 53, 55, 65, 78, 75), Trimester = c(1, 2, 3, 1, 2, 3, 1, 2, 3)) STUDENT

Name 	Science_Marks	Math_Marks	Trimester
Ram	55		70		1
Ram	60		75		2
Ram	65		73		3
Shyam	80		50		1
Shyam	70		53		2
Shyam	75		55		3
Jessica	45		65		1
Jessica	65		78		2
Jessica	70		75		3

Step 2: Application of Aggregate Function

Query 1: To find the average marks for each student in a year (Trimester 1, 2 and 3) ​

We will use aggregate function to carry out this task with grouping variable as "Name" and FUN = mean. ​

aggregate(STUDENT[ , 2:3], by = list(STUDENT$Name), FUN = mean)

Group_1		Science_Marks	Math_Marks
Jessica		60		72.66667
Ram		60		72.66667
Shyam		75		52.66667

Query 2: To find the average marks of the subjects in each trimester. ​

We will use aggregate function to carry out this task with grouping variable as "Trimester" and FUN = mean. ​

aggregate(STUDENT[ , 2:3], by = list(STUDENT$Trimester), FUN = mean)

Group_1		Science_Marks	Math_Marks
1		60		61.66667
2		65		68.66667
3		70		67.66667

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

PyTorch Project to Build a GAN Model on MNIST Dataset
In this deep learning project, you will learn how to build a GAN Model on MNIST Dataset for generating new images of handwritten digits.

Build Real Estate Price Prediction Model with NLP and FastAPI
In this Real Estate Price Prediction Project, you will learn to build a real estate price prediction machine learning model and deploy it on Heroku using FastAPI Framework.

Loan Eligibility Prediction in Python using H2O.ai
In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

End-to-End ML Model Monitoring using Airflow and Docker
In this MLOps Project, you will learn to build an end to end pipeline to monitor any changes in the predictive power of model or degradation of data.

Build ARCH and GARCH Models in Time Series using Python
In this Project we will build an ARCH and a GARCH model using Python

Build Piecewise and Spline Regression Models in Python
In this Regression Project, you will learn how to build a piecewise and spline regression model from scratch in Python to predict the points scored by a sports team.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

MLOps using Azure Devops to Deploy a Classification Model
In this MLOps Azure project, you will learn how to deploy a classification machine learning model to predict the customer's license status on Azure through scalable CI/CD ML pipelines.

NLP Project for Multi Class Text Classification using BERT Model
In this NLP Project, you will learn how to build a multi-class text classification model using using the pre-trained BERT model.

Build a Graph Based Recommendation System in Python -Part 1
Python Recommender Systems Project - Learn to build a graph based recommendation system in eCommerce to recommend products.