What Is Tapply In R?

This R code example introduces you to tapply in R and shows how to use it with a data frame as input.

Objective For ‘What is Tapply in R?’

This beginner-friendly R code will show you how to use tapply() in R, taking a data frame as input with the help of a simple example.

What Is Tapply In R?

Tapply is a function in R that applies a function to each group of values in a vector, grouped by a factor variable. It is a member of the apply() family of functions, which also includes sapply() and lapply().

The syntax for tapply() is as follows-

tapply(X, Index, FUN)

where, 

  • X is the vector to which you want to apply the function.

  • INDEX is the factor variable to group by.

  • FUN is the function to apply to each group of values in X.

The tapply function returns a vector, list, or array, depending on the function specified in FUN. If FUN returns a single atomic value for each group, tapply returns a vector. If FUN returns a list for each group, tapply returns a list. If FUN returns an array for each group, tapply returns an array.

How Does tapply Command in R Work When Values Are Missing?

The tapply command in R will, by default, remove any rows with missing values before applying the function to each group. This can be avoided by setting the na.rm argument to TRUE. You can use the following code to calculate the mean height of people in each gender group, even if there are missing values in the height vector-

tapply(height, gender, mean, na.rm = TRUE)

Example For tapply in R 'Index Missing Arguments'

The following code will calculate the mean height of people in each gender group, but it will remove any rows with missing values in the gender variable-

tapply(height, gender, mean)

How To Get Proportions in Tapply R?

To get proportions in tapply R, you can use the proportions() function. The proportions() function takes a table as input and returns a table with the exact dimensions but with the values expressed as proportions of the marginal sums. The marginal sums are the sums of the values in each row or column of the table. To use the proportions() function with tapply, you can simply pass the output of tapply to the proportions() function. The proportions() function can also be used to calculate proportions of marginal sums greater than a certain value. 

How To Use tapply Sample With Replacement in R?

Tapply can be used to sample with replacement in R by using the sample() function in the FUN argument. The sample() function takes a vector and several samples to return. When the replace argument is set to TRUE, the sample() function will sample with replacement.

For example, you can use the code below to sample 100 values from the height vector with replacement-

# Sample 100 values from the `height` vector with replacement

tapply(height, 1, sample, 100, replace = TRUE)

Check Out These End-To-End Solved R Projects To Accelerate Your Data Science Career

Steps Showing How To Use tapply In R

The following steps will show you how to use the tapply() function in R by taking a data frame as input in an easy-to-understand example.

Step 1 - Import Libraries And Load Dataset

For our code example, we will work with a dataset about the customers going to the supermarket mall. The variable that we are interested in is Annual Income (in 1000s) and Gender.

# Data manipulation package

library(tidyverse)

# reading a dataset

customer_seg = read.csv('R_75_Mall_Customers.csv')

glimpse(customer_seg)

The output of the above code- 

Rows: 200

Columns: 5

$ CustomerID             <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...

$ Gender                 <fct> Male, Male, Female, Female, Female, Female, ...

$ Age                    <int> 19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...

$ Annual.Income..k..     <int> 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...

$ Spending.Score..1.100. <int> 39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...

Step 2 - Using tapply Function in R

In this step, we will use the tapply function with the following syntax-

tapply(X, Index, FUN)

where, 

  • X is the vector to which you want to apply the function.

  • INDEX is the factor variable to group by.

  • FUN is the function to apply to each group of values in X.

# applying mean function on "Annual income" with respect to Gender factor                 

result = tapply(customer_seg[,c("Annual.Income..k..")], customer_seg$Gender, FUN =  mean)

# Mean annual income 

result

The output of the above code is-

Female      59.25

Male          62.2272727272727

How To Fix Tapply in R Error: 'First Argument Must Be A Vector'?

Tapply() in R requires the first argument to be a vector to group the data by the values in the vector. The error message "first argument must be a vector" in tapply() means that the first argument, which is the variable to be grouped by, is not a vector. 

To fix this error message, you must ensure that the first argument to tapply is a vector without missing values. You can do this by using the is.vector() function to check if the first argument is a vector, and the is.na() function to check if the first argument contains missing values.

Below is an example of how you can fix the error message "first argument must be a vector" in tapply-

# The following code will produce the error message "first argument must be a vector"

tapply(height, gender, mean)

# The following code will fix the error message

x <- gender

if (!is.vector(x) | any(is.na(x))) {

  x <- as.vector(x)

  x[is.na(x)] <- NA

}

tapply(height, x, mean)

Learn Real-World Uses Of Tapply in R With ProjectPro

This step-by-step R code example has explored the versatile 'tapply' function in R, understanding its usage and how it can be applied to various scenarios, including handling missing values and obtaining proportions. Furthermore, if you want to expand your data science skillset and expertise, we recommend you explore the ProjectPro platform. By engaging with over 270 end-to-end solved projects in the ProjectPro repository, you can gain the skills and expertise needed to excel in data science and machine learning.

FAQs on Tapply in R

1. Under what conditions does tapply in R gives an integer result?

Tapply in R gives an integer result under the following conditions-

  • The function specified in the FUN argument returns an integer result. 

  • The input data is grouped by a factor variable with integer levels.

  • The na.rm argument is set to TRUE, and there are missing values in the input data. 

2. When using tapply command in R, should the lengths be the same?

The lengths of the vectors don't need to be the same when using the tapply command in R. However, if the lengths are not the same, the output of tapply will be a vector of length equal to the number of levels in the factor variable used to group the data, and the values in the output vector will be the results of applying the function specified in the FUN argument to the corresponding groups in the input vectors.

3. Can tapply include more than one category in R?

Yes, tapply in R can include more than one category. You can pass a list of factor variables to the INDEX argument. Tapply will then group the data by all the factor variables in the list and apply the function specified in the FUN argument to each group.

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Many-to-One LSTM for Sentiment Analysis and Text Generation
In this LSTM Project , you will build develop a sentiment detection model using many-to-one LSTMs for accurate prediction of sentiment labels in airline text reviews. Additionally, we will also train many-to-one LSTMs on 'Alice's Adventures in Wonderland' to generate contextually relevant text.

Build OCR from Scratch Python using YOLO and Tesseract
In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

Image Segmentation using Mask R-CNN with Tensorflow
In this Deep Learning Project on Image Segmentation Python, you will learn how to implement the Mask R-CNN model for early fire detection.

AWS MLOps Project for ARCH and GARCH Time Series Models
Build and deploy ARCH and GARCH time series forecasting models in Python on AWS .

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Build Portfolio Optimization Machine Learning Models in R
Machine Learning Project for Financial Risk Modelling and Portfolio Optimization with R- Build a machine learning model in R to develop a strategy for building a portfolio for maximized returns.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Build CNN for Image Colorization using Deep Transfer Learning
Image Processing Project -Train a model for colorization to make grayscale images colorful using convolutional autoencoders.

Langchain Project for Customer Support App in Python
In this LLM Project, you will learn how to enhance customer support interactions through Large Language Models (LLMs), enabling intelligent, context-aware responses. This Langchain project aims to seamlessly integrate LLM technology with databases, PDF knowledge bases, and audio processing agents to create a comprehensive customer support application.

PyTorch Project to Build a LSTM Text Classification Model
In this PyTorch Project you will learn how to build an LSTM Text Classification model for Classifying the Reviews of an App .