What is the use of separate function in tidyr package?

This recipe explains what is the use of separate function in tidyr package

Recipe Objective

In Data Science, we need to organise, clean, explore the data before we can create any machine learning models for best accuracy. Organising of data can be done in many ways which helps us to analyse the data well. There is a concept of Tidy Data introduced by Hadley Wickman in his paper in 2014 where every row is an observation, every column is a variable and every cell contains values in a dataframe. ​

Sentiment Analysis Project on eCommerce Product Reviews with Source Code

tidyr package is used to achieve tidy data and contains the following important functions:

  1. Gather
  2. Spread
  3. Separate
  4. Unite

In this recipe, we will demonstrate what is the use of separate() function in tidyr package. ​

Step 1: Loading required package and a dataset

Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in is Annual.Income which is in 1000s , Spending Score and Age ​

# Data manipulation package library(tidyr) # reading a dataset customer_seg = read.csv('R_216_Mall_Customers.csv') # gettig first 5 rows customer_seg = customer_seg[1:4, ] #creating a column with data that needs to be separated customer_seg$Year = c("2010-01", "2010-03", "2012-05", "2018-08") glimpse(customer_seg)

Rows: 4
Columns: 6
$ CustomerID              1, 2, 3, 4
$ Gender                  Male, Male, Female, Female
$ Age                     19, 21, 20, 23
$ Annual.Income..k..      15, 15, 16, 16
$ Spending.Score..1.100.  39, 81, 6, 77
$ Year                    "2010-01", "2010-03", "2012-05", "2018-08"

Step 2: Using separate() function

Separate function is used to widen the dataset by separating data in a single column into multiple columns. In this example, we will separate the data in Year column into year and month column. ​

Syntax: separate(data = , col = , into = , sep =) ​

where: ​

  1. data = dataframe
  2. col = column name whose data needs to be separated in multiple columns
  3. into = column names that needs to created for the spliited data
  4. sep = The character which separates the data

new_dataframe = spread(data = customer_seg, key = Gender, value = Age) new_dataframe

Rows: 4
Columns: 7
$ CustomerID              1, 2, 3, 4
$ Gender                  Male, Male, Female, Female
$ Age                     19, 21, 20, 23
$ Annual.Income..k..      15, 15, 16, 16
$ Spending.Score..1.100.  39, 81, 6, 77
$ Year                    "2010", "2010", "2012", "2018"
$ Month                   "01", "03", "05", "08"

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

MLOps AWS Project on Topic Modeling using Gunicorn Flask
In this project we will see the end-to-end machine learning development process to design, build and manage reproducible, testable, and evolvable machine learning models by using AWS

Build a Review Classification Model using Gated Recurrent Unit
In this Machine Learning project, you will build a classification model in python to classify the reviews of an app on a scale of 1 to 5 using Gated Recurrent Unit.

BERT Text Classification using DistilBERT and ALBERT Models
This Project Explains how to perform Text Classification using ALBERT and DistilBERT

Recommender System Machine Learning Project for Beginners-1
Recommender System Machine Learning Project for Beginners - Learn how to design, implement and train a rule-based recommender system in Python

Build a Multi-Class Classification Model in Python on Saturn Cloud
In this machine learning classification project, you will build a multi-class classification model in Python on Saturn Cloud to predict the license status of a business.

ML Model Deployment on AWS for Customer Churn Prediction
MLOps Project-Deploy Machine Learning Model to Production Python on AWS for Customer Churn Prediction

BigMart Sales Prediction ML Project in Python
The goal of the BigMart Sales Prediction ML project is to build and evaluate different predictive models and determine the sales of each product at a store.

Build Real Estate Price Prediction Model with NLP and FastAPI
In this Real Estate Price Prediction Project, you will learn to build a real estate price prediction machine learning model and deploy it on Heroku using FastAPI Framework.

Build a Churn Prediction Model using Ensemble Learning
Learn how to build ensemble machine learning models like Random Forest, Adaboost, and Gradient Boosting for Customer Churn Prediction using Python

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.