How to aggregate multiple columns in a dataframe in R?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to aggregate multiple columns in a dataframe in R?

How to aggregate multiple columns in a dataframe in R?

This recipe helps you aggregate multiple columns in a dataframe in R

0

Recipe Objective

Aggregate function is used in similar places where tapply function is applied. It calculates the summary statistics after collating raw data with respect to a grouping variable in a dataset. ​

It is a two step process. Firstly, it groups the raw data based on a categorical variable and then perform the required calculation on each groups formed.

There are three things which is required to perform aggregation: Data, grouping variable and function/calculation to perform.

Syntax: aggregate (x, by = , FUN = )

Where:

  1. x = dataframe
  2. by = Grouping variable/column in the form of list input
  3. FUN = built-in or derived function that needs to be performed on multiple columns after aggregation.

In this recipe, we will demonstrate how to aggregate multiple columns in a dataframe in R. ​

Step 1: Creating a DataFrame

Creating a STUDENT dataframe with Name and marks of two subjects in 3 Trimester exams. ​

STUDENT = data.frame(Name = c("Ram","Ram", "Ram", "Shyam", "Shyam", "Shyam", "Jessica", "Jessica", "Jessica"), Science_Marks = c(55, 60, 65, 80, 70, 75, 45, 65, 70), Math_Marks = c(70, 75, 73, 50, 53, 55, 65, 78, 75), Trimester = c(1, 2, 3, 1, 2, 3, 1, 2, 3)) STUDENT

Name 	Science_Marks	Math_Marks	Trimester
Ram	55		70		1
Ram	60		75		2
Ram	65		73		3
Shyam	80		50		1
Shyam	70		53		2
Shyam	75		55		3
Jessica	45		65		1
Jessica	65		78		2
Jessica	70		75		3

Step 2: Application of Aggregate Function

Query 1: To find the average marks for each student in a year (Trimester 1, 2 and 3) ​

We will use aggregate function to carry out this task with grouping variable as "Name" and FUN = mean. ​

aggregate(STUDENT[ , 2:3], by = list(STUDENT$Name), FUN = mean)
Group_1		Science_Marks	Math_Marks
Jessica		60		72.66667
Ram		60		72.66667
Shyam		75		52.66667

Query 2: To find the average marks of the subjects in each trimester. ​

We will use aggregate function to carry out this task with grouping variable as "Trimester" and FUN = mean. ​

aggregate(STUDENT[ , 2:3], by = list(STUDENT$Trimester), FUN = mean)
Group_1		Science_Marks	Math_Marks
1		60		61.66667
2		65		68.66667
3		70		67.66667

Relevant Projects

Human Activity Recognition Using Multiclass Classification in Python
In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.