WHat is tapply in R?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

WHat is tapply in R?

WHat is tapply in R?

WHat is tapply in R

0

Recipe Objective

Problem: Iteration through a long list or vector using a for loop takes tremendous amount of time.

This problem is solved by using apply family of functions in R. This family of functions can be fed with many built-in functions to perform different tasks on the collection of objects such as list, vector, dataframe etc.

The family of apply functions are listed below:

  1. apply()
  2. lapply()
  3. sapply()
  4. tapply()

tapply() is a function that applies a function to subsets of a vector which is defined by another vector, usually a factor.

This recipe demonstrates how to use the tapply() using dataframe as input

Step 1: Importing libraries and loading dataset

Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in is Annual.Income (in 1000s) and Gender.

# Data manipulation package library(tidyverse) ​ # reading a dataset customer_seg = read.csv('R_75_Mall_Customers.csv') ​ glimpse(customer_seg)
Rows: 200
Columns: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Gender                  Male, Male, Female, Female, Female, Female, ...
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...

Step 2: Using tapply()

Using the tapply() with the following syntax:

tapply(X, INDEX , FUN)

where:

  1. X = data frame or matrix ;
  2. INDEX = an argument which represents the Factor column by which the subsets are created
  3. FUN = function that needs to be applied on every element of the dataframe
# applying mean function on "Annual income" with respect to Gender factor result = tapply(customer_seg[,c("Annual.Income..k..")], customer_seg$Gender, FUN = mean) ​ ​ # Mean annual income result
Female59.25Male62.2272727272727

Relevant Projects

Human Activity Recognition Using Multiclass Classification in Python
In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.