What Is Tapply In R?

This R code example introduces you to tapply in R and shows how to use it with a data frame as input.
Last Updated: 06 Nov 2023

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Objective For ‘What is Tapply in R?’

This beginner-friendly R code will show you how to use tapply() in R, taking a data frame as input with the help of a simple example.

What Is Tapply In R?

Tapply is a function in R that applies a function to each group of values in a vector, grouped by a factor variable. It is a member of the apply() family of functions, which also includes sapply() and lapply().

The syntax for tapply() is as follows-

tapply(X, Index, FUN)

where,

X is the vector to which you want to apply the function.
INDEX is the factor variable to group by.
FUN is the function to apply to each group of values in X.

The tapply function returns a vector, list, or array, depending on the function specified in FUN. If FUN returns a single atomic value for each group, tapply returns a vector. If FUN returns a list for each group, tapply returns a list. If FUN returns an array for each group, tapply returns an array.

How Does tapply Command in R Work When Values Are Missing?

The tapply command in R will, by default, remove any rows with missing values before applying the function to each group. This can be avoided by setting the na.rm argument to TRUE. You can use the following code to calculate the mean height of people in each gender group, even if there are missing values in the height vector-

tapply(height, gender, mean, na.rm = TRUE)

Example For tapply in R 'Index Missing Arguments'

The following code will calculate the mean height of people in each gender group, but it will remove any rows with missing values in the gender variable-

tapply(height, gender, mean)

How To Get Proportions in Tapply R?

To get proportions in tapply R, you can use the proportions() function. The proportions() function takes a table as input and returns a table with the exact dimensions but with the values expressed as proportions of the marginal sums. The marginal sums are the sums of the values in each row or column of the table. To use the proportions() function with tapply, you can simply pass the output of tapply to the proportions() function. The proportions() function can also be used to calculate proportions of marginal sums greater than a certain value.

How To Use tapply Sample With Replacement in R?

Tapply can be used to sample with replacement in R by using the sample() function in the FUN argument. The sample() function takes a vector and several samples to return. When the replace argument is set to TRUE, the sample() function will sample with replacement.

For example, you can use the code below to sample 100 values from the height vector with replacement-

# Sample 100 values from the `height` vector with replacement

tapply(height, 1, sample, 100, replace = TRUE)

Check Out These End-To-End Solved R Projects To Accelerate Your Data Science Career

Steps Showing How To Use tapply In R

The following steps will show you how to use the tapply() function in R by taking a data frame as input in an easy-to-understand example.

Step 1 - Import Libraries And Load Dataset

For our code example, we will work with a dataset about the customers going to the supermarket mall. The variable that we are interested in is Annual Income (in 1000s) and Gender.

# Data manipulation package

library(tidyverse)

# reading a dataset

customer_seg = read.csv('R_75_Mall_Customers.csv')

glimpse(customer_seg)

The output of the above code-

Rows: 200

Columns: 5

$ CustomerID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...

$ Gender <fct> Male, Male, Female, Female, Female, Female, ...

$ Age <int> 19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...

$ Annual.Income..k.. <int> 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...

$ Spending.Score..1.100. <int> 39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...

Step 2 - Using tapply Function in R

In this step, we will use the tapply function with the following syntax-

tapply(X, Index, FUN)

where,

X is the vector to which you want to apply the function.
INDEX is the factor variable to group by.
FUN is the function to apply to each group of values in X.

# applying mean function on "Annual income" with respect to Gender factor

result = tapply(customer_seg[,c("Annual.Income..k..")], customer_seg$Gender, FUN = mean)

# Mean annual income

result

The output of the above code is-

Female 59.25

Male 62.2272727272727

How To Fix Tapply in R Error: 'First Argument Must Be A Vector'?

Tapply() in R requires the first argument to be a vector to group the data by the values in the vector. The error message "first argument must be a vector" in tapply() means that the first argument, which is the variable to be grouped by, is not a vector.

To fix this error message, you must ensure that the first argument to tapply is a vector without missing values. You can do this by using the is.vector() function to check if the first argument is a vector, and the is.na() function to check if the first argument contains missing values.

Below is an example of how you can fix the error message "first argument must be a vector" in tapply-

# The following code will produce the error message "first argument must be a vector"

tapply(height, gender, mean)

# The following code will fix the error message

x <- gender

if (!is.vector(x) | any(is.na(x))) {

x <- as.vector(x)

x[is.na(x)] <- NA

}

tapply(height, x, mean)

Learn Real-World Uses Of Tapply in R With ProjectPro

This step-by-step R code example has explored the versatile 'tapply' function in R, understanding its usage and how it can be applied to various scenarios, including handling missing values and obtaining proportions. Furthermore, if you want to expand your data science skillset and expertise, we recommend you explore the ProjectPro platform. By engaging with over 270 end-to-end solved projects in the ProjectPro repository, you can gain the skills and expertise needed to excel in data science and machine learning.

FAQs on Tapply in R

1. Under what conditions does tapply in R gives an integer result?

Tapply in R gives an integer result under the following conditions-

The function specified in the FUN argument returns an integer result.
The input data is grouped by a factor variable with integer levels.
The na.rm argument is set to TRUE, and there are missing values in the input data.

2. When using tapply command in R, should the lengths be the same?

The lengths of the vectors don't need to be the same when using the tapply command in R. However, if the lengths are not the same, the output of tapply will be a vector of length equal to the number of levels in the factor variable used to group the data, and the values in the output vector will be the results of applying the function specified in the FUN argument to the corresponding groups in the input vectors.

3. Can tapply include more than one category in R?

Yes, tapply in R can include more than one category. You can pass a list of factor variables to the INDEX argument. Tapply will then group the data by all the factor variables in the list and apply the function specified in the FUN argument to each group.

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

What Is Tapply In R?

Objective For ‘What is Tapply in R?’

What Is Tapply In R?

How Does tapply Command in R Work When Values Are Missing?

Example For tapply in R 'Index Missing Arguments'

How To Get Proportions in Tapply R?

How To Use tapply Sample With Replacement in R?

Steps Showing How To Use tapply In R

Step 1 - Import Libraries And Load Dataset

Step 2 - Using tapply Function in R

How To Fix Tapply in R Error: 'First Argument Must Be A Vector'?

Learn Real-World Uses Of Tapply in R With ProjectPro

FAQs on Tapply in R

1. Under what conditions does tapply in R gives an integer result?

2. When using tapply command in R, should the lengths be the same?

3. Can tapply include more than one category in R?

Ray han

Relevant Projects

You might also like

Relevant Projects