How to handle dummy variables in R?

This recipe helps you handle dummy variables in R

Recipe Objective

In Data Science, whenever we create machine learning models using different algorithms, we want all our variables to be numeric for the algorithm to process it. If the data we have is non-numeric then we need to process or handle the data before creating any model. ​

In this recipe, we will learn how to handle string categorical variable by converting them into a dummmy variable.

Categorical variable is a type of variable which has distinct string values or categories to which different observations are assigned to. They don't hold any mathematical significance in creation of a model. Hence, we need to convert them into dummy variable which is similar to OneHotEncoding technique in Python. It creates (n-1) columns for n-unique categories/values in a categorical variable and assigns 0 and 1 to it. "1" indicating that the category is being considered.

Step 1: Loading the required library and dataset

We require fastDummies and knitr package to do so ​

# installing required package install.packages(c("fastDummies","knitr")) library(fastDummies) library(knitr) # Data manipulation package library(tidyverse) # reading a dataset customer_seg = read.csv('R_223_Mall_Customers.csv') glimpse(customer_seg)

Observations: 200
Variables: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Gender                  Male, Male, Female, Female, Female, Female, ...
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...

Step 2: Creating dummy variable

We create dummy variables for "Gender" variable using dummy_cols() function of fastDummies package. ​

Syntax: fastDummies::dummy_cols(x, select_columns = ) ​

where: ​

  1. x = dataframe
  2. select_columns = Column (Categorical variable) that you wanna create dummy variables of.

# creating dummy variables df_dummies = fastDummies::dummy_cols(customer_seg, select_columns = "Gender") # dropping the original column along with Gender_female column to get (n-1) coluns similar to OneHotEncoding. new_customer_seg = df_dummies[c(-2,-6)] glimpse(new_customer_seg)

Rows: 200
Columns: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...
$ Gender_Male             1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1,...

Note: In the dummy variable (Gender_male) created: 1 = Male and 0 = Female ​

query_1 = mutate(STUDENT, Total_marks = Science_Marks+Math_Marks) glimpse(query_1)

What Users are saying..

profile image

Ameeruddin Mohammed

ETL (Abintio) developer at IBM
linkedin profile url

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

Learn to Build an End-to-End Machine Learning Pipeline - Part 1
In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, addressing a major challenge in the logistics industry.

Build an AI Chatbot from Scratch using Keras Sequential Model
In this NLP Project, you will learn how to build an AI Chatbot from Scratch using Keras Sequential Model.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Learn to Build a Siamese Neural Network for Image Similarity
In this Deep Learning Project, you will learn how to build a siamese neural network with Keras and Tensorflow for Image Similarity.

Build an End-to-End AWS SageMaker Classification Model
MLOps on AWS SageMaker -Learn to Build an End-to-End Classification Model on SageMaker to predict a patient’s cause of death.

Word2Vec and FastText Word Embedding with Gensim in Python
In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

Many-to-One LSTM for Sentiment Analysis and Text Generation
In this LSTM Project , you will build develop a sentiment detection model using many-to-one LSTMs for accurate prediction of sentiment labels in airline text reviews. Additionally, we will also train many-to-one LSTMs on 'Alice's Adventures in Wonderland' to generate contextually relevant text.

Deploy Transformer BART Model for Text summarization on GCP
Learn to Deploy a Machine Learning Model for the Abstractive Text Summarization on Google Cloud Platform (GCP)

NLP Project for Multi Class Text Classification using BERT Model
In this NLP Project, you will learn how to build a multi-class text classification model using using the pre-trained BERT model.

Build a Autoregressive and Moving Average Time Series Model
In this time series project, you will learn to build Autoregressive and Moving Average Time Series Models to forecast future readings, optimize performance, and harness the power of predictive analytics for sensor data.