How to aggregate multiple columns in a dataframe in R?

This recipe helps you aggregate multiple columns in a dataframe in R

Recipe Objective

Aggregate function is used in similar places where to apply function is applied. It calculates the summary statistics after collating raw data with respect to a grouping variable in a dataset. ​

It is a two step process. Firstly, it groups the raw data based on a categorical variable and then perform the required calculation on each groups formed.

There are three things which is required to perform aggregation: Data, grouping variable and function/calculation to perform.

Syntax: aggregate (x, by = , FUN = )

Where:

  1. x = dataframe
  2. by = Grouping variable/column in the form of list input
  3. FUN = built-in or derived function that needs to be performed on multiple columns after aggregation.

In this recipe, we will demonstrate how to aggregate multiple columns in a dataframe in R. ​

Step 1: Creating a DataFrame

Creating a STUDENT dataframe with Name and marks of two subjects in 3 Trimester exams. ​

STUDENT = data.frame(Name = c("Ram","Ram", "Ram", "Shyam", "Shyam", "Shyam", "Jessica", "Jessica", "Jessica"), Science_Marks = c(55, 60, 65, 80, 70, 75, 45, 65, 70), Math_Marks = c(70, 75, 73, 50, 53, 55, 65, 78, 75), Trimester = c(1, 2, 3, 1, 2, 3, 1, 2, 3)) STUDENT

Name 	Science_Marks	Math_Marks	Trimester
Ram	55		70		1
Ram	60		75		2
Ram	65		73		3
Shyam	80		50		1
Shyam	70		53		2
Shyam	75		55		3
Jessica	45		65		1
Jessica	65		78		2
Jessica	70		75		3

Step 2: Application of Aggregate Function

Query 1: To find the average marks for each student in a year (Trimester 1, 2 and 3) ​

We will use aggregate function to carry out this task with grouping variable as "Name" and FUN = mean. ​

aggregate(STUDENT[ , 2:3], by = list(STUDENT$Name), FUN = mean)

Group_1		Science_Marks	Math_Marks
Jessica		60		72.66667
Ram		60		72.66667
Shyam		75		52.66667

Query 2: To find the average marks of the subjects in each trimester. ​

We will use aggregate function to carry out this task with grouping variable as "Trimester" and FUN = mean. ​

aggregate(STUDENT[ , 2:3], by = list(STUDENT$Trimester), FUN = mean)

Group_1		Science_Marks	Math_Marks
1		60		61.66667
2		65		68.66667
3		70		67.66667

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

AWS MLOps Project to Deploy a Classification Model [Banking]
In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

Time Series Analysis with Facebook Prophet Python and Cesium
Time Series Analysis Project - Use the Facebook Prophet and Cesium Open Source Library for Time Series Forecasting in Python

Azure Text Analytics for Medical Search Engine Deployment
Microsoft Azure Project - Use Azure text analytics cognitive service to deploy a machine learning model into Azure Databricks

Deploy Transformer BART Model for Text summarization on GCP
Learn to Deploy a Machine Learning Model for the Abstractive Text Summarization on Google Cloud Platform (GCP)

Learn Object Tracking (SOT, MOT) using OpenCV and Python
Get Started with Object Tracking using OpenCV and Python - Learn to implement Multiple Instance Learning Tracker (MIL) algorithm, Generic Object Tracking Using Regression Networks Tracker (GOTURN) algorithm, Kernelized Correlation Filters Tracker (KCF) algorithm, Tracking, Learning, Detection Tracker (TLD) algorithm for single and multiple object tracking from various video clips.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Deploying Machine Learning Models with Flask for Beginners
In this MLOps on GCP project you will learn to deploy a sales forecasting ML Model using Flask.

Azure Deep Learning-Deploy RNN CNN models for TimeSeries
In this Azure MLOps Project, you will learn to perform docker-based deployment of RNN and CNN Models for Time Series Forecasting on Azure Cloud.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Build a Similar Images Finder with Python, Keras, and Tensorflow
Build your own image similarity application using Python to search and find images of products that are similar to any given product. You will implement the K-Nearest Neighbor algorithm to find products with maximum similarity.