What is the use of spread function in tidyr package?

This recipe explains what is the use of spread function in tidyr package

Recipe Objective

In Data Science, we need to organise, clean, explore the data before we can create any machine learning models for best accuracy. Organising of data can be done in many ways which helps us to analyse the data well. There is a concept of Tidy Data introduced by Hadley Wickman in his paper in 2014 where every row is an observation, every column is a variable and every cell contains values in a dataframe. ​

Explore the BERT Variants - ALBERT vs DistilBERT

tidyr package is used to achieve tidy data and contains the following important functions:

  1. Gather
  2. Spread
  3. Separate
  4. Unite

In this recipe, we will demonstrate what is the use of spread() function in tidyr package. ​

Step 1: Loading required package and a dataset

Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in is Annual.Income which is in 1000s , Spending Score and Age ​

# Data manipulation package library(tidyr) # ggplot for data visualisation library(ggplot2) # reading a dataset customer_seg = read.csv('R_215_Mall_Customers.csv') glimpse(customer_seg)

Rows: 200
Columns: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Gender                  Male, Male, Female, Female, Female, Female, ...
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...

Step 2: Using spread() function

Spread function is used to widen the dataset by spreading a key-value pair across multiple columns. In this example, we will spread the unique values in "Gender" variable in multiple columns with giving values of "Age" in each cell. ​

Syntax: spread(data = , key = , value =) ​

where: ​

  1. data = dataframe
  2. key = column name which needs to be expanded based on it's unique values
  3. value = column name whose value is inserted into each newly formed columns

new_dataframe = spread(data = customer_seg, key = Gender, value = Age) new_dataframe

Rows: 200
Columns: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...
$ Female                  NA, NA, 20, 23, 31, 22, 35, 23, NA, 30, NA, ...
$ Male                    19, 21, NA, NA, NA, NA, NA, NA, 64, NA, 67, ...

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Locality Sensitive Hashing Python Code for Look-Alike Modelling
In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.

Langchain Project for Customer Support App in Python
In this LLM Project, you will learn how to enhance customer support interactions through Large Language Models (LLMs), enabling intelligent, context-aware responses. This Langchain project aims to seamlessly integrate LLM technology with databases, PDF knowledge bases, and audio processing agents to create a comprehensive customer support application.

AWS MLOps Project for Gaussian Process Time Series Modeling
MLOps Project to Build and Deploy a Gaussian Process Time Series Model in Python on AWS

Detectron2 Object Detection and Segmentation Example Python
Object Detection using Detectron2 - Build a Dectectron2 model to detect the zones and inhibitions in antibiogram images.

Ola Bike Rides Request Demand Forecast
Given big data at taxi service (ride-hailing) i.e. OLA, you will learn multi-step time series forecasting and clustering with Mini-Batch K-means Algorithm on geospatial data to predict future ride requests for a particular region at a given time.

Build a Review Classification Model using Gated Recurrent Unit
In this Machine Learning project, you will build a classification model in python to classify the reviews of an app on a scale of 1 to 5 using Gated Recurrent Unit.

Learn to Build a Siamese Neural Network for Image Similarity
In this Deep Learning Project, you will learn how to build a siamese neural network with Keras and Tensorflow for Image Similarity.

Build a Customer Churn Prediction Model using Decision Trees
Develop a customer churn prediction model using decision tree machine learning algorithms and data science on streaming service data.

MLOps Project to Deploy Resume Parser Model on Paperspace
In this MLOps project, you will learn how to deploy a Resume Parser Streamlit Application on Paperspace Private Cloud.

Census Income Data Set Project-Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based oncensus data.