How to convert categorical variables into numerical in Python?

This recipe explains how to convert categorical variables into numerical variables in Python.

Recipe Objective: How to convert categorical variables into numerical in Python? 

Machine Learning models can not work on categorical variables as strings, so we need to change them into numerical form. We can assign numbers for each category, but it may not be effective when the difference between the categories can not be measured. This can be done by making new features according to the categories with bool values. For this, we will be using dummy variables to do so.

This recipe will show you how we can transform categorical data to numerical data in Python.

How To Transform Categorical Variables To Numeric Python?

For converting categorical data to numerical data, the Python source code in this recipe does the following-

1. Creates dictionary and converts it into a dataframe

2. Uses "get_dummies" function for the encoding

3. Concats the final encoded dataset into the final dataframe

4. Drops categorical variable column

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Steps To Convert Categorical Variable To Numeric Python

The following steps will show you how to transform categorical variables in Python.

Step 1 - Import the library

import pandas as pd

We have only imported pandas this is reqired for dataset.

Step 2 - Setting up the Data

We have created a dictionary and passed it through the pd.DataFrame to create a dataframe with columns 'name', 'episodes', 'gender'.

data = {'name': ['Sheldon', 'Penny', 'Amy', 'Penny', 'Raj', 'Sheldon'], 'episodes': [42, 24, 31, 29, 37, 40], 'gender': ['male', 'female', 'female', 'female', 'male', 'male']} df = pd.DataFrame(data, columns = ['name','episodes', 'gender']) print(df)

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 3 - Making Dummy Variables and Printing the final Dataset

We can observe that in the column 'gender', there are two categories, male and female, so for that column, we have to make dummies according to the categories. So, we have passed that column in the function and stored it in df_gender. Finally, we have added those columns to our original dataset. 

df_gender = pd.get_dummies(df['gender']) 

df_new = pd.concat([df, df_gender], axis=1)

print(df_new) 

So the output comes as:

      name  episodes  gender

0  Sheldon        42    male

1    Penny        24  female

2      Amy        31  female

3    Penny        29  female

4      Raj        37    male

5  Sheldon        40    male

 

      name  episodes  gender  female  male

0  Sheldon        42    male       0     1

1    Penny        24  female       1     0

2      Amy        31  female       1     0

3    Penny        29  female       1     0

4      Raj        37    male       0     1

5  Sheldon        40    male       0     1

 

Download Materials

What Users are saying..

profile image

Ed Godalle

Director Data Analytics at EY / EY Tech
linkedin profile url

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Build Piecewise and Spline Regression Models in Python
In this Regression Project, you will learn how to build a piecewise and spline regression model from scratch in Python to predict the points scored by a sports team.

End-to-End ML Model Monitoring using Airflow and Docker
In this MLOps Project, you will learn to build an end to end pipeline to monitor any changes in the predictive power of model or degradation of data.

BigMart Sales Prediction ML Project in Python
The goal of the BigMart Sales Prediction ML project is to build and evaluate different predictive models and determine the sales of each product at a store.

Build an End-to-End AWS SageMaker Classification Model
MLOps on AWS SageMaker -Learn to Build an End-to-End Classification Model on SageMaker to predict a patient’s cause of death.

Build CI/CD Pipeline for Machine Learning Projects using Jenkins
In this project, you will learn how to create a CI/CD pipeline for a search engine application using Jenkins.

MLOps Project to Build Search Relevancy Algorithm with SBERT
In this MLOps SBERT project you will learn to build and deploy an accurate and scalable search algorithm on AWS using SBERT and ANNOY to enhance search relevancy in news articles.

Hands-On Approach to Master PyTorch Tensors with Examples
In this deep learning project, you will learn how to perform various operations on the building block of PyTorch : Tensors.

Deep Learning Project- Real-Time Fruit Detection using YOLOv4
In this deep learning project, you will learn to build an accurate, fast, and reliable real-time fruit detection system using the YOLOv4 object detection model for robotic harvesting platforms.

Recommender System Machine Learning Project for Beginners-2
Recommender System Machine Learning Project for Beginners Part 2- Learn how to build a recommender system for market basket analysis using association rule mining.

Build ARCH and GARCH Models in Time Series using Python
In this Project we will build an ARCH and a GARCH model using Python