What is the use of spread function in tidyr package?

This recipe explains what is the use of spread function in tidyr package
Last Updated: 17 Aug 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

In Data Science, we need to organise, clean, explore the data before we can create any machine learning models for best accuracy. Organising of data can be done in many ways which helps us to analyse the data well. There is a concept of Tidy Data introduced by Hadley Wickman in his paper in 2014 where every row is an observation, every column is a variable and every cell contains values in a dataframe.

Explore the BERT Variants - ALBERT vs DistilBERT

tidyr package is used to achieve tidy data and contains the following important functions:

Gather
Spread
Separate
Unite

In this recipe, we will demonstrate what is the use of spread() function in tidyr package.

Step 1: Loading required package and a dataset

Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in is Annual.Income which is in 1000s , Spending Score and Age

# Data manipulation package library(tidyr) # ggplot for data visualisation library(ggplot2) # reading a dataset customer_seg = read.csv('R_215_Mall_Customers.csv') glimpse(customer_seg)

Rows: 200
Columns: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Gender                  Male, Male, Female, Female, Female, Female, ...
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...

Step 2: Using spread() function

Spread function is used to widen the dataset by spreading a key-value pair across multiple columns. In this example, we will spread the unique values in "Gender" variable in multiple columns with giving values of "Age" in each cell.

Syntax: spread(data = , key = , value =)

where:

data = dataframe
key = column name which needs to be expanded based on it's unique values
value = column name whose value is inserted into each newly formed columns

new_dataframe = spread(data = customer_seg, key = Gender, value = Age) new_dataframe

Rows: 200
Columns: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...
$ Female                  NA, NA, 20, 23, 31, 22, 35, 23, NA, 30, NA, ...
$ Male                    19, 21, NA, NA, NA, NA, NA, NA, 64, NA, 67, ...

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Build a Hybrid Recommender System in Python using LightFM

In this Recommender System project, you will build a hybrid recommender system in Python using LightFM .

View Project Details

MLOps using Azure Devops to Deploy a Classification Model

In this MLOps Azure project, you will learn how to deploy a classification machine learning model to predict the customer's license status on Azure through scalable CI/CD ML pipelines.

View Project Details

Abstractive Text Summarization using Transformers-BART Model

Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.

View Project Details

Build a Review Classification Model using Gated Recurrent Unit

In this Machine Learning project, you will build a classification model in python to classify the reviews of an app on a scale of 1 to 5 using Gated Recurrent Unit.

View Project Details

Time Series Forecasting with LSTM Neural Network Python

Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

View Project Details

Build a Text Classification Model with Attention Mechanism NLP

In this NLP Project, you will learn to build a multi class text classification model with attention mechanism.

View Project Details

Llama2 Project for MetaData Generation using FAISS and RAGs

In this LLM Llama2 Project, you will automate metadata generation using Llama2, RAGs, and AWS to reduce manual efforts.

View Project Details

Deep Learning Project for Time Series Forecasting in Python

Deep Learning for Time Series Forecasting in Python -A Hands-On Approach to Build Deep Learning Models (MLP, CNN, LSTM, and a Hybrid Model CNN-LSTM) on Time Series Data.

View Project Details

MLOps Project to Build Search Relevancy Algorithm with SBERT

In this MLOps SBERT project you will learn to build and deploy an accurate and scalable search algorithm on AWS using SBERT and ANNOY to enhance search relevancy in news articles.

View Project Details

Loan Eligibility Prediction in Python using H2O.ai

In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

View Project Details

What is the use of spread function in tidyr package?

Recipe Objective

Step 1: Loading required package and a dataset

Step 2: Using spread() function

Ray han

Relevant Projects

You might also like

Relevant Projects