What Is Tapply In R?

This R code example introduces you to tapply in R and shows how to use it with a data frame as input.

Objective For ‘What is Tapply in R?’

This beginner-friendly R code will show you how to use tapply() in R, taking a data frame as input with the help of a simple example.

What Is Tapply In R?

Tapply is a function in R that applies a function to each group of values in a vector, grouped by a factor variable. It is a member of the apply() family of functions, which also includes sapply() and lapply().

The syntax for tapply() is as follows-

tapply(X, Index, FUN)

where, 

  • X is the vector to which you want to apply the function.

  • INDEX is the factor variable to group by.

  • FUN is the function to apply to each group of values in X.

The tapply function returns a vector, list, or array, depending on the function specified in FUN. If FUN returns a single atomic value for each group, tapply returns a vector. If FUN returns a list for each group, tapply returns a list. If FUN returns an array for each group, tapply returns an array.

How Does tapply Command in R Work When Values Are Missing?

The tapply command in R will, by default, remove any rows with missing values before applying the function to each group. This can be avoided by setting the na.rm argument to TRUE. You can use the following code to calculate the mean height of people in each gender group, even if there are missing values in the height vector-

tapply(height, gender, mean, na.rm = TRUE)

Example For tapply in R 'Index Missing Arguments'

The following code will calculate the mean height of people in each gender group, but it will remove any rows with missing values in the gender variable-

tapply(height, gender, mean)

How To Get Proportions in Tapply R?

To get proportions in tapply R, you can use the proportions() function. The proportions() function takes a table as input and returns a table with the exact dimensions but with the values expressed as proportions of the marginal sums. The marginal sums are the sums of the values in each row or column of the table. To use the proportions() function with tapply, you can simply pass the output of tapply to the proportions() function. The proportions() function can also be used to calculate proportions of marginal sums greater than a certain value. 

How To Use tapply Sample With Replacement in R?

Tapply can be used to sample with replacement in R by using the sample() function in the FUN argument. The sample() function takes a vector and several samples to return. When the replace argument is set to TRUE, the sample() function will sample with replacement.

For example, you can use the code below to sample 100 values from the height vector with replacement-

# Sample 100 values from the `height` vector with replacement

tapply(height, 1, sample, 100, replace = TRUE)

Check Out These End-To-End Solved R Projects To Accelerate Your Data Science Career

Steps Showing How To Use tapply In R

The following steps will show you how to use the tapply() function in R by taking a data frame as input in an easy-to-understand example.

Step 1 - Import Libraries And Load Dataset

For our code example, we will work with a dataset about the customers going to the supermarket mall. The variable that we are interested in is Annual Income (in 1000s) and Gender.

# Data manipulation package

library(tidyverse)

# reading a dataset

customer_seg = read.csv('R_75_Mall_Customers.csv')

glimpse(customer_seg)

The output of the above code- 

Rows: 200

Columns: 5

$ CustomerID             <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...

$ Gender                 <fct> Male, Male, Female, Female, Female, Female, ...

$ Age                    <int> 19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...

$ Annual.Income..k..     <int> 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...

$ Spending.Score..1.100. <int> 39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...

Step 2 - Using tapply Function in R

In this step, we will use the tapply function with the following syntax-

tapply(X, Index, FUN)

where, 

  • X is the vector to which you want to apply the function.

  • INDEX is the factor variable to group by.

  • FUN is the function to apply to each group of values in X.

# applying mean function on "Annual income" with respect to Gender factor                 

result = tapply(customer_seg[,c("Annual.Income..k..")], customer_seg$Gender, FUN =  mean)

# Mean annual income 

result

The output of the above code is-

Female      59.25

Male          62.2272727272727

How To Fix Tapply in R Error: 'First Argument Must Be A Vector'?

Tapply() in R requires the first argument to be a vector to group the data by the values in the vector. The error message "first argument must be a vector" in tapply() means that the first argument, which is the variable to be grouped by, is not a vector. 

To fix this error message, you must ensure that the first argument to tapply is a vector without missing values. You can do this by using the is.vector() function to check if the first argument is a vector, and the is.na() function to check if the first argument contains missing values.

Below is an example of how you can fix the error message "first argument must be a vector" in tapply-

# The following code will produce the error message "first argument must be a vector"

tapply(height, gender, mean)

# The following code will fix the error message

x <- gender

if (!is.vector(x) | any(is.na(x))) {

  x <- as.vector(x)

  x[is.na(x)] <- NA

}

tapply(height, x, mean)

Learn Real-World Uses Of Tapply in R With ProjectPro

This step-by-step R code example has explored the versatile 'tapply' function in R, understanding its usage and how it can be applied to various scenarios, including handling missing values and obtaining proportions. Furthermore, if you want to expand your data science skillset and expertise, we recommend you explore the ProjectPro platform. By engaging with over 270 end-to-end solved projects in the ProjectPro repository, you can gain the skills and expertise needed to excel in data science and machine learning.

FAQs on Tapply in R

1. Under what conditions does tapply in R gives an integer result?

Tapply in R gives an integer result under the following conditions-

  • The function specified in the FUN argument returns an integer result. 

  • The input data is grouped by a factor variable with integer levels.

  • The na.rm argument is set to TRUE, and there are missing values in the input data. 

2. When using tapply command in R, should the lengths be the same?

The lengths of the vectors don't need to be the same when using the tapply command in R. However, if the lengths are not the same, the output of tapply will be a vector of length equal to the number of levels in the factor variable used to group the data, and the values in the output vector will be the results of applying the function specified in the FUN argument to the corresponding groups in the input vectors.

3. Can tapply include more than one category in R?

Yes, tapply in R can include more than one category. You can pass a list of factor variables to the INDEX argument. Tapply will then group the data by all the factor variables in the list and apply the function specified in the FUN argument to each group.

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Build Piecewise and Spline Regression Models in Python
In this Regression Project, you will learn how to build a piecewise and spline regression model from scratch in Python to predict the points scored by a sports team.

Build a CNN Model with PyTorch for Image Classification
In this deep learning project, you will learn how to build an Image Classification Model using PyTorch CNN

Time Series Project to Build a Multiple Linear Regression Model
Learn to build a Multiple linear regression model in Python on Time Series Data

Build Regression (Linear,Ridge,Lasso) Models in NumPy Python
In this machine learning regression project, you will learn to build NumPy Regression Models (Linear Regression, Ridge Regression, Lasso Regression) from Scratch.

End-to-End Speech Emotion Recognition Project using ANN
Speech Emotion Recognition using RAVDESS Audio Dataset - Build an Artificial Neural Network Model to Classify Audio Data into various Emotions like Sad, Happy, Angry, and Neutral

Locality Sensitive Hashing Python Code for Look-Alike Modelling
In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.

Deep Learning Project for Beginners with Source Code Part 1
Learn to implement deep neural networks in Python .

Skip Gram Model Python Implementation for Word Embeddings
Skip-Gram Model word2vec Example -Learn how to implement the skip gram algorithm in NLP for word embeddings on a set of documents.

Medical Image Segmentation Deep Learning Project
In this deep learning project, you will learn to implement Unet++ models for medical image segmentation to detect and classify colorectal polyps.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.