How to convert categorical variables into numerical in Python?

This recipe explains how to convert categorical variables into numerical variables in Python.
Last Updated: 06 Sep 2023

Get access to Data Science projects View all Data Science projects

DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective: How to convert categorical variables into numerical in Python?

Machine Learning models can not work on categorical variables as strings, so we need to change them into numerical form. We can assign numbers for each category, but it may not be effective when the difference between the categories can not be measured. This can be done by making new features according to the categories with bool values. For this, we will be using dummy variables to do so.

This recipe will show you how we can transform categorical data to numerical data in Python.

How To Transform Categorical Variables To Numeric Python?

For converting categorical data to numerical data, the Python source code in this recipe does the following-

1. Creates dictionary and converts it into a dataframe

2. Uses "get_dummies" function for the encoding

3. Concats the final encoded dataset into the final dataframe

4. Drops categorical variable column

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Steps To Convert Categorical Variable To Numeric Python

The following steps will show you how to transform categorical variables in Python.

Step 1 - Import the library

import pandas as pd

We have only imported pandas this is reqired for dataset.

Step 2 - Setting up the Data

We have created a dictionary and passed it through the pd.DataFrame to create a dataframe with columns 'name', 'episodes', 'gender'.

data = {'name': ['Sheldon', 'Penny', 'Amy', 'Penny', 'Raj', 'Sheldon'], 'episodes': [42, 24, 31, 29, 37, 40], 'gender': ['male', 'female', 'female', 'female', 'male', 'male']} df = pd.DataFrame(data, columns = ['name','episodes', 'gender']) print(df)

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 3 - Making Dummy Variables and Printing the final Dataset

We can observe that in the column 'gender', there are two categories, male and female, so for that column, we have to make dummies according to the categories. So, we have passed that column in the function and stored it in df_gender. Finally, we have added those columns to our original dataset.

df_gender = pd.get_dummies(df['gender'])

df_new = pd.concat([df, df_gender], axis=1)

print(df_new)

So the output comes as:

name episodes gender

0 Sheldon 42 male

1 Penny 24 female

2 Amy 31 female

3 Penny 29 female

4 Raj 37 male

5 Sheldon 40 male

name episodes gender female male

0 Sheldon 42 male 0 1

1 Penny 24 female 1 0

2 Amy 31 female 1 0

3 Penny 29 female 1 0

4 Raj 37 male 0 1

5 Sheldon 40 male 0 1

Download Materials

iPython Notebook

What Users are saying..

Abhinav Agarwal

Graduate Student at Northwestern University

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More