What is the use of separate function in tidyr package?

What is the use of separate function in tidyr package?

What is the use of separate function in tidyr package?

This recipe explains what is the use of separate function in tidyr package


Recipe Objective

In Data Science, we need to organise, clean, explore the data before we can create any machine learning models for best accuracy. Organising of data can be done in many ways which helps us to analyse the data well. There is a concept of Tidy Data introduced by Hadley Wickman in his paper in 2014 where every row is an observation, every column is a variable and every cell contains values in a dataframe. ​

tidyr package is used to achieve tidy data and contains the following important functions:

  1. Gather
  2. Spread
  3. Separate
  4. Unite

In this recipe, we will demonstrate what is the use of separate() function in tidyr package. ​

Step 1: Loading required package and a dataset

Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in is Annual.Income which is in 1000s , Spending Score and Age ​

# Data manipulation package library(tidyr) # reading a dataset customer_seg = read.csv('R_216_Mall_Customers.csv') # gettig first 5 rows customer_seg = customer_seg[1:4, ] #creating a column with data that needs to be separated customer_seg$Year = c("2010-01", "2010-03", "2012-05", "2018-08") glimpse(customer_seg)
Rows: 4
Columns: 6
$ CustomerID              1, 2, 3, 4
$ Gender                  Male, Male, Female, Female
$ Age                     19, 21, 20, 23
$ Annual.Income..k..      15, 15, 16, 16
$ Spending.Score..1.100.  39, 81, 6, 77
$ Year                    "2010-01", "2010-03", "2012-05", "2018-08"

Step 2: Using separate() function

Separate function is used to widen the dataset by separating data in a single column into multiple columns. In this example, we will separate the data in Year column into year and month column. ​

Syntax: separate(data = , col = , into = , sep =) ​

where: ​

  1. data = dataframe
  2. col = column name whose data needs to be separated in multiple columns
  3. into = column names that needs to created for the spliited data
  4. sep = The character which separates the data
new_dataframe = spread(data = customer_seg, key = Gender, value = Age) new_dataframe
Rows: 4
Columns: 7
$ CustomerID              1, 2, 3, 4
$ Gender                  Male, Male, Female, Female
$ Age                     19, 21, 20, 23
$ Annual.Income..k..      15, 15, 16, 16
$ Spending.Score..1.100.  39, 81, 6, 77
$ Year                    "2010", "2010", "2012", "2018"
$ Month                   "01", "03", "05", "08"

Relevant Projects

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

NLP and Deep Learning For Fake News Classification in Python
In this project you will use Python to implement various machine learning methods( RNN, LSTM, GRU) for fake news classification.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Census Income Data Set Project - Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based on census data.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.