What is the use of separate function in tidyr package?

This recipe explains what is the use of separate function in tidyr package

Recipe Objective

In Data Science, we need to organise, clean, explore the data before we can create any machine learning models for best accuracy. Organising of data can be done in many ways which helps us to analyse the data well. There is a concept of Tidy Data introduced by Hadley Wickman in his paper in 2014 where every row is an observation, every column is a variable and every cell contains values in a dataframe. ​

Sentiment Analysis Project on eCommerce Product Reviews with Source Code

tidyr package is used to achieve tidy data and contains the following important functions:

  1. Gather
  2. Spread
  3. Separate
  4. Unite

In this recipe, we will demonstrate what is the use of separate() function in tidyr package. ​

Step 1: Loading required package and a dataset

Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in is Annual.Income which is in 1000s , Spending Score and Age ​

# Data manipulation package library(tidyr) # reading a dataset customer_seg = read.csv('R_216_Mall_Customers.csv') # gettig first 5 rows customer_seg = customer_seg[1:4, ] #creating a column with data that needs to be separated customer_seg$Year = c("2010-01", "2010-03", "2012-05", "2018-08") glimpse(customer_seg)

Rows: 4
Columns: 6
$ CustomerID              1, 2, 3, 4
$ Gender                  Male, Male, Female, Female
$ Age                     19, 21, 20, 23
$ Annual.Income..k..      15, 15, 16, 16
$ Spending.Score..1.100.  39, 81, 6, 77
$ Year                    "2010-01", "2010-03", "2012-05", "2018-08"

Step 2: Using separate() function

Separate function is used to widen the dataset by separating data in a single column into multiple columns. In this example, we will separate the data in Year column into year and month column. ​

Syntax: separate(data = , col = , into = , sep =) ​

where: ​

  1. data = dataframe
  2. col = column name whose data needs to be separated in multiple columns
  3. into = column names that needs to created for the spliited data
  4. sep = The character which separates the data

new_dataframe = spread(data = customer_seg, key = Gender, value = Age) new_dataframe

Rows: 4
Columns: 7
$ CustomerID              1, 2, 3, 4
$ Gender                  Male, Male, Female, Female
$ Age                     19, 21, 20, 23
$ Annual.Income..k..      15, 15, 16, 16
$ Spending.Score..1.100.  39, 81, 6, 77
$ Year                    "2010", "2010", "2012", "2018"
$ Month                   "01", "03", "05", "08"

What Users are saying..

profile image

Ameeruddin Mohammed

ETL (Abintio) developer at IBM
linkedin profile url

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

NLP Project for Beginners on Text Processing and Classification
This Project Explains the Basic Text Preprocessing and How to Build a Classification Model in Python

MLOps Project to Deploy Resume Parser Model on Paperspace
In this MLOps project, you will learn how to deploy a Resume Parser Streamlit Application on Paperspace Private Cloud.

Build a Autoregressive and Moving Average Time Series Model
In this time series project, you will learn to build Autoregressive and Moving Average Time Series Models to forecast future readings, optimize performance, and harness the power of predictive analytics for sensor data.

Linear Regression Model Project in Python for Beginners Part 1
Machine Learning Linear Regression Project in Python to build a simple linear regression model and master the fundamentals of regression for beginners.

A/B Testing Approach for Comparing Performance of ML Models
The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

Time Series Forecasting Project-Building ARIMA Model in Python
Build a time series ARIMA model in Python to forecast the use of arrival rate density to support staffing decisions at call centres.

Skip Gram Model Python Implementation for Word Embeddings
Skip-Gram Model word2vec Example -Learn how to implement the skip gram algorithm in NLP for word embeddings on a set of documents.

AWS MLOps Project to Deploy a Classification Model [Banking]
In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

MLOps using Azure Devops to Deploy a Classification Model
In this MLOps Azure project, you will learn how to deploy a classification machine learning model to predict the customer's license status on Azure through scalable CI/CD ML pipelines.

Learn to Build an End-to-End Machine Learning Pipeline - Part 1
In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, addressing a major challenge in the logistics industry.