How to aggregate multiple columns in a dataframe in R?

This recipe helps you aggregate multiple columns in a dataframe in R
Last Updated: 19 Dec 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

Aggregate function is used in similar places where to apply function is applied. It calculates the summary statistics after collating raw data with respect to a grouping variable in a dataset.

It is a two step process. Firstly, it groups the raw data based on a categorical variable and then perform the required calculation on each groups formed.

There are three things which is required to perform aggregation: Data, grouping variable and function/calculation to perform.

Syntax: aggregate (x, by = , FUN = )

Where:

x = dataframe
by = Grouping variable/column in the form of list input
FUN = built-in or derived function that needs to be performed on multiple columns after aggregation.

In this recipe, we will demonstrate how to aggregate multiple columns in a dataframe in R.

Recipe Objective
- Step 1: Creating a DataFrame
- Step 2: Application of Aggregate Function

Step 1: Creating a DataFrame

Creating a STUDENT dataframe with Name and marks of two subjects in 3 Trimester exams.

STUDENT = data.frame(Name = c("Ram","Ram", "Ram", "Shyam", "Shyam", "Shyam", "Jessica", "Jessica", "Jessica"), Science_Marks = c(55, 60, 65, 80, 70, 75, 45, 65, 70), Math_Marks = c(70, 75, 73, 50, 53, 55, 65, 78, 75), Trimester = c(1, 2, 3, 1, 2, 3, 1, 2, 3)) STUDENT

Name 	Science_Marks	Math_Marks	Trimester
Ram	55		70		1
Ram	60		75		2
Ram	65		73		3
Shyam	80		50		1
Shyam	70		53		2
Shyam	75		55		3
Jessica	45		65		1
Jessica	65		78		2
Jessica	70		75		3

Step 2: Application of Aggregate Function

Query 1: To find the average marks for each student in a year (Trimester 1, 2 and 3)

We will use aggregate function to carry out this task with grouping variable as "Name" and FUN = mean.

aggregate(STUDENT[ , 2:3], by = list(STUDENT$Name), FUN = mean)

Group_1		Science_Marks	Math_Marks
Jessica		60		72.66667
Ram		60		72.66667
Shyam		75		52.66667

Query 2: To find the average marks of the subjects in each trimester.

We will use aggregate function to carry out this task with grouping variable as "Trimester" and FUN = mean.

aggregate(STUDENT[ , 2:3], by = list(STUDENT$Trimester), FUN = mean)

Group_1		Science_Marks	Math_Marks
1		60		61.66667
2		65		68.66667
3		70		67.66667

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

AWS MLOps Project to Deploy a Classification Model [Banking]

In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

View Project Details

Time Series Analysis with Facebook Prophet Python and Cesium

Time Series Analysis Project - Use the Facebook Prophet and Cesium Open Source Library for Time Series Forecasting in Python

View Project Details

Azure Text Analytics for Medical Search Engine Deployment

Microsoft Azure Project - Use Azure text analytics cognitive service to deploy a machine learning model into Azure Databricks

View Project Details

Deploy Transformer BART Model for Text summarization on GCP

Learn to Deploy a Machine Learning Model for the Abstractive Text Summarization on Google Cloud Platform (GCP)

View Project Details

Learn Object Tracking (SOT, MOT) using OpenCV and Python

Get Started with Object Tracking using OpenCV and Python - Learn to implement Multiple Instance Learning Tracker (MIL) algorithm, Generic Object Tracking Using Regression Networks Tracker (GOTURN) algorithm, Kernelized Correlation Filters Tracker (KCF) algorithm, Tracking, Learning, Detection Tracker (TLD) algorithm for single and multiple object tracking from various video clips.

View Project Details

How to aggregate multiple columns in a dataframe in R?

Recipe Objective

Table of Contents

Step 1: Creating a DataFrame

Step 2: Application of Aggregate Function

Ray han

Relevant Projects

You might also like

Relevant Projects