What is the use of separate function in tidyr package?

This recipe explains what is the use of separate function in tidyr package
Last Updated: 19 Aug 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

In Data Science, we need to organise, clean, explore the data before we can create any machine learning models for best accuracy. Organising of data can be done in many ways which helps us to analyse the data well. There is a concept of Tidy Data introduced by Hadley Wickman in his paper in 2014 where every row is an observation, every column is a variable and every cell contains values in a dataframe.

Sentiment Analysis Project on eCommerce Product Reviews with Source Code

tidyr package is used to achieve tidy data and contains the following important functions:

Gather
Spread
Separate
Unite

In this recipe, we will demonstrate what is the use of separate() function in tidyr package.

Step 1: Loading required package and a dataset

Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in is Annual.Income which is in 1000s , Spending Score and Age

# Data manipulation package library(tidyr) # reading a dataset customer_seg = read.csv('R_216_Mall_Customers.csv') # gettig first 5 rows customer_seg = customer_seg[1:4, ] #creating a column with data that needs to be separated customer_seg$Year = c("2010-01", "2010-03", "2012-05", "2018-08") glimpse(customer_seg)

Rows: 4
Columns: 6
$ CustomerID              1, 2, 3, 4
$ Gender                  Male, Male, Female, Female
$ Age                     19, 21, 20, 23
$ Annual.Income..k..      15, 15, 16, 16
$ Spending.Score..1.100.  39, 81, 6, 77
$ Year                    "2010-01", "2010-03", "2012-05", "2018-08"

Step 2: Using separate() function

Separate function is used to widen the dataset by separating data in a single column into multiple columns. In this example, we will separate the data in Year column into year and month column.

Syntax: separate(data = , col = , into = , sep =)

where:

data = dataframe
col = column name whose data needs to be separated in multiple columns
into = column names that needs to created for the spliited data
sep = The character which separates the data

new_dataframe = spread(data = customer_seg, key = Gender, value = Age) new_dataframe

Rows: 4
Columns: 7
$ CustomerID              1, 2, 3, 4
$ Gender                  Male, Male, Female, Female
$ Age                     19, 21, 20, 23
$ Annual.Income..k..      15, 15, 16, 16
$ Spending.Score..1.100.  39, 81, 6, 77
$ Year                    "2010", "2010", "2012", "2018"
$ Month                   "01", "03", "05", "08"

What Users are saying..

Ameeruddin Mohammed

ETL (Abintio) developer at IBM

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

NLP Project for Beginners on Text Processing and Classification

This Project Explains the Basic Text Preprocessing and How to Build a Classification Model in Python

View Project Details

MLOps Project to Deploy Resume Parser Model on Paperspace

In this MLOps project, you will learn how to deploy a Resume Parser Streamlit Application on Paperspace Private Cloud.

View Project Details

Build a Autoregressive and Moving Average Time Series Model

In this time series project, you will learn to build Autoregressive and Moving Average Time Series Models to forecast future readings, optimize performance, and harness the power of predictive analytics for sensor data.

View Project Details

Linear Regression Model Project in Python for Beginners Part 1

Machine Learning Linear Regression Project in Python to build a simple linear regression model and master the fundamentals of regression for beginners.

View Project Details

A/B Testing Approach for Comparing Performance of ML Models

The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

View Project Details

What is the use of separate function in tidyr package?

Recipe Objective

Step 1: Loading required package and a dataset

Step 2: Using separate() function

Ameeruddin Mohammed

Relevant Projects

You might also like

Relevant Projects