How to convert categorical variables into numerical in Python?

This recipe explains how to convert categorical variables into numerical variables in Python.

Recipe Objective: How to convert categorical variables into numerical in Python? 

Machine Learning models can not work on categorical variables as strings, so we need to change them into numerical form. We can assign numbers for each category, but it may not be effective when the difference between the categories can not be measured. This can be done by making new features according to the categories with bool values. For this, we will be using dummy variables to do so.

This recipe will show you how we can transform categorical data to numerical data in Python.

How To Transform Categorical Variables To Numeric Python?

For converting categorical data to numerical data, the Python source code in this recipe does the following-

1. Creates dictionary and converts it into a dataframe

2. Uses "get_dummies" function for the encoding

3. Concats the final encoded dataset into the final dataframe

4. Drops categorical variable column

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Steps To Convert Categorical Variable To Numeric Python

The following steps will show you how to transform categorical variables in Python.

Step 1 - Import the library

import pandas as pd

We have only imported pandas this is reqired for dataset.

Step 2 - Setting up the Data

We have created a dictionary and passed it through the pd.DataFrame to create a dataframe with columns 'name', 'episodes', 'gender'.

data = {'name': ['Sheldon', 'Penny', 'Amy', 'Penny', 'Raj', 'Sheldon'], 'episodes': [42, 24, 31, 29, 37, 40], 'gender': ['male', 'female', 'female', 'female', 'male', 'male']} df = pd.DataFrame(data, columns = ['name','episodes', 'gender']) print(df)

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 3 - Making Dummy Variables and Printing the final Dataset

We can observe that in the column 'gender', there are two categories, male and female, so for that column, we have to make dummies according to the categories. So, we have passed that column in the function and stored it in df_gender. Finally, we have added those columns to our original dataset. 

df_gender = pd.get_dummies(df['gender']) 

df_new = pd.concat([df, df_gender], axis=1)

print(df_new) 

So the output comes as:

      name  episodes  gender

0  Sheldon        42    male

1    Penny        24  female

2      Amy        31  female

3    Penny        29  female

4      Raj        37    male

5  Sheldon        40    male

 

      name  episodes  gender  female  male

0  Sheldon        42    male       0     1

1    Penny        24  female       1     0

2      Amy        31  female       1     0

3    Penny        29  female       1     0

4      Raj        37    male       0     1

5  Sheldon        40    male       0     1

 

Download Materials

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

MLOps AWS Project on Topic Modeling using Gunicorn Flask
In this project we will see the end-to-end machine learning development process to design, build and manage reproducible, testable, and evolvable machine learning models by using AWS

Stock Price Prediction Project using LSTM and RNN
Learn how to predict stock prices using RNN and LSTM models. Understand deep learning concepts and apply them to real-world financial data for accurate forecasting.

Word2Vec and FastText Word Embedding with Gensim in Python
In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

Time Series Python Project using Greykite and Neural Prophet
In this time series project, you will forecast Walmart sales over time using the powerful, fast, and flexible time series forecasting library Greykite that helps automate time series problems.

MLOps Project to Build Search Relevancy Algorithm with SBERT
In this MLOps SBERT project you will learn to build and deploy an accurate and scalable search algorithm on AWS using SBERT and ANNOY to enhance search relevancy in news articles.

Loan Default Prediction Project using Explainable AI ML Models
Loan Default Prediction Project that employs sophisticated machine learning models, such as XGBoost and Random Forest and delves deep into the realm of Explainable AI, ensuring every prediction is transparent and understandable.

Learn to Build an End-to-End Machine Learning Pipeline - Part 2
In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, incorporating Hopsworks' feature store and Weights and Biases for model experimentation.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Classification Projects on Machine Learning for Beginners - 2
Learn to implement various ensemble techniques to predict license status for a given business.

Build a Logistic Regression Model in Python from Scratch
Regression project to implement logistic regression in python from scratch on streaming app data.