How to convert categorical variables into numerical in Python?

This recipe explains how to convert categorical variables into numerical variables in Python.

Recipe Objective: How to convert categorical variables into numerical in Python? 

Machine Learning models can not work on categorical variables as strings, so we need to change them into numerical form. We can assign numbers for each category, but it may not be effective when the difference between the categories can not be measured. This can be done by making new features according to the categories with bool values. For this, we will be using dummy variables to do so.

This recipe will show you how we can transform categorical data to numerical data in Python.

How To Transform Categorical Variables To Numeric Python?

For converting categorical data to numerical data, the Python source code in this recipe does the following-

1. Creates dictionary and converts it into a dataframe

2. Uses "get_dummies" function for the encoding

3. Concats the final encoded dataset into the final dataframe

4. Drops categorical variable column

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Steps To Convert Categorical Variable To Numeric Python

The following steps will show you how to transform categorical variables in Python.

Step 1 - Import the library

import pandas as pd

We have only imported pandas this is reqired for dataset.

Step 2 - Setting up the Data

We have created a dictionary and passed it through the pd.DataFrame to create a dataframe with columns 'name', 'episodes', 'gender'.

data = {'name': ['Sheldon', 'Penny', 'Amy', 'Penny', 'Raj', 'Sheldon'], 'episodes': [42, 24, 31, 29, 37, 40], 'gender': ['male', 'female', 'female', 'female', 'male', 'male']} df = pd.DataFrame(data, columns = ['name','episodes', 'gender']) print(df)

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 3 - Making Dummy Variables and Printing the final Dataset

We can observe that in the column 'gender', there are two categories, male and female, so for that column, we have to make dummies according to the categories. So, we have passed that column in the function and stored it in df_gender. Finally, we have added those columns to our original dataset. 

df_gender = pd.get_dummies(df['gender']) 

df_new = pd.concat([df, df_gender], axis=1)

print(df_new) 

So the output comes as:

      name  episodes  gender

0  Sheldon        42    male

1    Penny        24  female

2      Amy        31  female

3    Penny        29  female

4      Raj        37    male

5  Sheldon        40    male

 

      name  episodes  gender  female  male

0  Sheldon        42    male       0     1

1    Penny        24  female       1     0

2      Amy        31  female       1     0

3    Penny        29  female       1     0

4      Raj        37    male       0     1

5  Sheldon        40    male       0     1

 

Download Materials

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Deploy Transformer BART Model for Text summarization on GCP
Learn to Deploy a Machine Learning Model for the Abstractive Text Summarization on Google Cloud Platform (GCP)

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Langchain Project for Customer Support App in Python
In this LLM Project, you will learn how to enhance customer support interactions through Large Language Models (LLMs), enabling intelligent, context-aware responses. This Langchain project aims to seamlessly integrate LLM technology with databases, PDF knowledge bases, and audio processing agents to create a comprehensive customer support application.

Abstractive Text Summarization using Transformers-BART Model
Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

AWS Project to Build and Deploy LSTM Model with Sagemaker
In this AWS Sagemaker Project, you will learn to build a LSTM model on Sagemaker for sales forecasting while analyzing the impact of weather conditions on Sales.

Linear Regression Model Project in Python for Beginners Part 1
Machine Learning Linear Regression Project in Python to build a simple linear regression model and master the fundamentals of regression for beginners.

FEAST Feature Store Example for Scaling Machine Learning
FEAST Feature Store Example- Learn to use FEAST Feature Store to manage, store, and discover features for customer churn prediction machine learning project.

AWS MLOps Project for Gaussian Process Time Series Modeling
MLOps Project to Build and Deploy a Gaussian Process Time Series Model in Python on AWS

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.