How to handle dummy variables in R?

This recipe helps you handle dummy variables in R
Last Updated: 14 Jun 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

In Data Science, whenever we create machine learning models using different algorithms, we want all our variables to be numeric for the algorithm to process it. If the data we have is non-numeric then we need to process or handle the data before creating any model.

In this recipe, we will learn how to handle string categorical variable by converting them into a dummmy variable.

Categorical variable is a type of variable which has distinct string values or categories to which different observations are assigned to. They don't hold any mathematical significance in creation of a model. Hence, we need to convert them into dummy variable which is similar to OneHotEncoding technique in Python. It creates (n-1) columns for n-unique categories/values in a categorical variable and assigns 0 and 1 to it. "1" indicating that the category is being considered.

Recipe Objective
- Step 1: Loading the required library and dataset
- Step 2: Creating dummy variable

Step 1: Loading the required library and dataset

We require fastDummies and knitr package to do so

# installing required package install.packages(c("fastDummies","knitr")) library(fastDummies) library(knitr) # Data manipulation package library(tidyverse) # reading a dataset customer_seg = read.csv('R_223_Mall_Customers.csv') glimpse(customer_seg)

Observations: 200
Variables: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Gender                  Male, Male, Female, Female, Female, Female, ...
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...

Step 2: Creating dummy variable

We create dummy variables for "Gender" variable using dummy_cols() function of fastDummies package.

Syntax: fastDummies::dummy_cols(x, select_columns = )

where:

x = dataframe
select_columns = Column (Categorical variable) that you wanna create dummy variables of.

# creating dummy variables df_dummies = fastDummies::dummy_cols(customer_seg, select_columns = "Gender") # dropping the original column along with Gender_female column to get (n-1) coluns similar to OneHotEncoding. new_customer_seg = df_dummies[c(-2,-6)] glimpse(new_customer_seg)

Rows: 200
Columns: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...
$ Gender_Male             1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1,...

Note: In the dummy variable (Gender_male) created: 1 = Male and 0 = Female

query_1 = mutate(STUDENT, Total_marks = Science_Marks+Math_Marks) glimpse(query_1)

What Users are saying..

Ameeruddin Mohammed

ETL (Abintio) developer at IBM

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Learn to Build an End-to-End Machine Learning Pipeline - Part 1

In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, addressing a major challenge in the logistics industry.

View Project Details

Build an AI Chatbot from Scratch using Keras Sequential Model

In this NLP Project, you will learn how to build an AI Chatbot from Scratch using Keras Sequential Model.

View Project Details

Time Series Forecasting with LSTM Neural Network Python

Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

View Project Details

Learn to Build a Siamese Neural Network for Image Similarity

In this Deep Learning Project, you will learn how to build a siamese neural network with Keras and Tensorflow for Image Similarity.

View Project Details

Build an End-to-End AWS SageMaker Classification Model

MLOps on AWS SageMaker -Learn to Build an End-to-End Classification Model on SageMaker to predict a patient’s cause of death.

View Project Details

Word2Vec and FastText Word Embedding with Gensim in Python

In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

View Project Details

Many-to-One LSTM for Sentiment Analysis and Text Generation

In this LSTM Project , you will build develop a sentiment detection model using many-to-one LSTMs for accurate prediction of sentiment labels in airline text reviews. Additionally, we will also train many-to-one LSTMs on 'Alice's Adventures in Wonderland' to generate contextually relevant text.

View Project Details

How to handle dummy variables in R?

Recipe Objective

Table of Contents

Step 1: Loading the required library and dataset

Step 2: Creating dummy variable

Ameeruddin Mohammed

Relevant Projects

You might also like

Relevant Projects