What is the use of spread function in tidyr package?

This recipe explains what is the use of spread function in tidyr package
Last Updated: 17 Aug 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

In Data Science, we need to organise, clean, explore the data before we can create any machine learning models for best accuracy. Organising of data can be done in many ways which helps us to analyse the data well. There is a concept of Tidy Data introduced by Hadley Wickman in his paper in 2014 where every row is an observation, every column is a variable and every cell contains values in a dataframe.

Explore the BERT Variants - ALBERT vs DistilBERT

tidyr package is used to achieve tidy data and contains the following important functions:

Gather
Spread
Separate
Unite

In this recipe, we will demonstrate what is the use of spread() function in tidyr package.

Step 1: Loading required package and a dataset

Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in is Annual.Income which is in 1000s , Spending Score and Age

# Data manipulation package library(tidyr) # ggplot for data visualisation library(ggplot2) # reading a dataset customer_seg = read.csv('R_215_Mall_Customers.csv') glimpse(customer_seg)

Rows: 200
Columns: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Gender                  Male, Male, Female, Female, Female, Female, ...
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...

Step 2: Using spread() function

Spread function is used to widen the dataset by spreading a key-value pair across multiple columns. In this example, we will spread the unique values in "Gender" variable in multiple columns with giving values of "Age" in each cell.

Syntax: spread(data = , key = , value =)

where:

data = dataframe
key = column name which needs to be expanded based on it's unique values
value = column name whose value is inserted into each newly formed columns

new_dataframe = spread(data = customer_seg, key = Gender, value = Age) new_dataframe

Rows: 200
Columns: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...
$ Female                  NA, NA, 20, 23, 31, 22, 35, 23, NA, 30, NA, ...
$ Male                    19, 21, NA, NA, NA, NA, NA, NA, 64, NA, 67, ...

What Users are saying..

Abhinav Agarwal

Graduate Student at Northwestern University

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Locality Sensitive Hashing Python Code for Look-Alike Modelling

In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.

View Project Details

Langchain Project for Customer Support App in Python

In this LLM Project, you will learn how to enhance customer support interactions through Large Language Models (LLMs), enabling intelligent, context-aware responses. This Langchain project aims to seamlessly integrate LLM technology with databases, PDF knowledge bases, and audio processing agents to create a comprehensive customer support application.

View Project Details

What is the use of spread function in tidyr package?

Recipe Objective

Step 1: Loading required package and a dataset

Step 2: Using spread() function

Abhinav Agarwal

Relevant Projects

You might also like

Relevant Projects