How to convert categorical variables into numerical in Python?

This recipe explains how to convert categorical variables into numerical variables in Python.
Last Updated: 06 Sep 2023

Get access to Data Science projects View all Data Science projects

DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective: How to convert categorical variables into numerical in Python?

Machine Learning models can not work on categorical variables as strings, so we need to change them into numerical form. We can assign numbers for each category, but it may not be effective when the difference between the categories can not be measured. This can be done by making new features according to the categories with bool values. For this, we will be using dummy variables to do so.

This recipe will show you how we can transform categorical data to numerical data in Python.

How To Transform Categorical Variables To Numeric Python?

For converting categorical data to numerical data, the Python source code in this recipe does the following-

1. Creates dictionary and converts it into a dataframe

2. Uses "get_dummies" function for the encoding

3. Concats the final encoded dataset into the final dataframe

4. Drops categorical variable column

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Steps To Convert Categorical Variable To Numeric Python

The following steps will show you how to transform categorical variables in Python.

Step 1 - Import the library

import pandas as pd

We have only imported pandas this is reqired for dataset.

Step 2 - Setting up the Data

We have created a dictionary and passed it through the pd.DataFrame to create a dataframe with columns 'name', 'episodes', 'gender'.

data = {'name': ['Sheldon', 'Penny', 'Amy', 'Penny', 'Raj', 'Sheldon'], 'episodes': [42, 24, 31, 29, 37, 40], 'gender': ['male', 'female', 'female', 'female', 'male', 'male']} df = pd.DataFrame(data, columns = ['name','episodes', 'gender']) print(df)

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 3 - Making Dummy Variables and Printing the final Dataset

We can observe that in the column 'gender', there are two categories, male and female, so for that column, we have to make dummies according to the categories. So, we have passed that column in the function and stored it in df_gender. Finally, we have added those columns to our original dataset.

df_gender = pd.get_dummies(df['gender'])

df_new = pd.concat([df, df_gender], axis=1)

print(df_new)

So the output comes as:

name episodes gender

0 Sheldon 42 male

1 Penny 24 female

2 Amy 31 female

3 Penny 29 female

4 Raj 37 male

5 Sheldon 40 male

name episodes gender female male

0 Sheldon 42 male 0 1

1 Penny 24 female 1 0

2 Amy 31 female 1 0

3 Penny 29 female 1 0

4 Raj 37 male 0 1

5 Sheldon 40 male 0 1

Download Materials

iPython Notebook

What Users are saying..

Abhinav Agarwal

Graduate Student at Northwestern University

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Deploy Transformer BART Model for Text summarization on GCP

Learn to Deploy a Machine Learning Model for the Abstractive Text Summarization on Google Cloud Platform (GCP)

View Project Details

Machine Learning project for Retail Price Optimization

In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

View Project Details

Langchain Project for Customer Support App in Python

In this LLM Project, you will learn how to enhance customer support interactions through Large Language Models (LLMs), enabling intelligent, context-aware responses. This Langchain project aims to seamlessly integrate LLM technology with databases, PDF knowledge bases, and audio processing agents to create a comprehensive customer support application.

View Project Details

How to convert categorical variables into numerical in Python?

Recipe Objective: How to convert categorical variables into numerical in Python?

How To Transform Categorical Variables To Numeric Python?

Steps To Convert Categorical Variable To Numeric Python

Step 1 - Import the library

Step 2 - Setting up the Data

Step 3 - Making Dummy Variables and Printing the final Dataset

Abhinav Agarwal

Relevant Projects

You might also like

Relevant Projects