How to Transform Categorical Features to Numerical Features?

This Python code example will help you understand the process of transforming categorical features to numerical features. | ProjectPro
Last Updated: 12 Apr 2024

Get access to Data Science projects View all Data Science projects

DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Many machine learning algorithms, such as linear regression, logistic regression, and support vector machines, require numerical input features. Transforming categorical features into numerical features allows these algorithms to process the data effectively. Most machine learning models perform better when trained on numerical features rather than categorical ones. Numeric representations allow the models to learn more complex patterns and relationships in the data, improving performance and predictive accuracy. So, this guide is here to help you understand the essential techniques to seamlessly transform categorical features into numeric form and understand data preprocessing with confidence and precision.

How to Convert Categorical Data to Numerical Data in Python?
Write a Python Program to Make Categorical Values in Numeric Format for a given dataset?
How to Convert Numerical Data to Categorical Data in Python?
Learn to Transform Categorical Features into Numerical Features with ProjectPro!

How to Convert Categorical Data to Numerical Data in Python?

The conversion of categorical data to numerical form is a crucial preprocessing step in many machine learning tasks, enabling algorithms to process non-numeric information. Check out the steps below to understand the complete process using an example -

Step 1 - Import the library

import pandas as pd

We have only imported pandas; this is required for the dataset.

Step 2 - Setting up the Data

We have created a dictionary and passed it through the pd.DataFrame to create a dataframe with columns "name," "episodes," and "gender."

data = {"name": ["Sheldon", "Penny", "Amy", "Penny", "Raj", "Sheldon"],

"episodes": [42, 24, 31, 29, 37, 40],

"gender": ["male", "female", "female", "female", "male", "male"]}

df = pd.DataFrame(data, columns = ["name","episodes", "gender"])

print(df)

Step 3 - Converting the Values

We can clearly observe that in the column "gender" there are two categories male and female, so we can assign numbers to each category like 1 to male and 2 to female. Now, we are using LabelEncoder. We have first fitted the feature and transformed it.

le = preprocessing.LabelEncoder()

le.fit(df["gender"])

print(); print(list(le.classes_))

print(); print(le.transform(df["gender"]))

So the output comes as:

Feature Matrix:

Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Feature 6

0 -1.867524 1.745983 2.952435 -0.177492 -3.088648 1.762311

1 0.450144 -2.106431 -1.065847 -1.958231 -0.451780 -1.990662

2 -4.647836 -4.214226 -1.830341 -1.714825 -6.590249 -0.315993

3 1.958901 -1.313546 1.409145 -2.069271 1.508912 3.774923

4 2.001750 0.879350 -2.041154 1.917629 -0.760137 1.310228

Feature 7 Feature 8 Feature 9 Feature 10

0 -0.195266 1.029769 2.814171 0.071059

1 -2.530104 -1.377802 -0.013353 -2.849859

2 2.780038 -3.325841 -4.008319 2.001941

3 5.012315 -5.772415 -0.818187 -0.392333

4 0.671990 1.444606 -1.731576 -0.378597

Target Class:

TargetClass

0 1

1 2

2 1

3 0

4 0

Write a Python Program to Make Categorical Values in Numeric Format for a given dataset?

Let’s write a python program that uses the LabelEncoder from the scikit-learn library to convert categorical values in a dataset into numeric format:-

Python program to make categorical values in Numeric format

Python program output

This program inputs a sample dataset, converts categorical columns ('Name', 'Gender', and 'City') into numeric format using the LabelEncoder, and then displays the modified DataFrame. You can replace the sample dataset with your own by reading it from a file or any other source.

How to Convert Numerical Data to Categorical Data in Python?

Converting numerical data to categorical data in Python can be done using various methods, such as binning, one-hot encoding, label encoding, or custom functions. Here's a detailed explanation of each method:-

Binning

Binning involves dividing numerical data into bins or intervals and assigning labels to each bin. This method is proper when converting continuous data into categorical data.

Binning Example

One-hot encoding

One-hot encoding creates binary columns for each category. It is commonly used when the categorical data is nominal (no inherent order).

One-hot Encoding Example

Label Encoding

Label encoding assigns a unique integer to each category. It is suitable when the categorical data has an ordinal relationship.

Label Encoding Example

Learn to Transform Categorical Features into Numerical Features with ProjectPro!

These practical examples have helped you understand the importance of selecting the most suitable transformation method based on the nature of our data. While one-hot encoding preserves all categories but can lead to high dimensionality, label encoding simplifies the process, potentially introducing unintended ordinal relationships. Target encoding, conversely, showcases promise in capturing categorical-variable relationships with the target variable, though caution is warranted to prevent overfitting and data leakage. ProjectPro facilitates hands-on learning experiences crucial for mastering such concepts. Through ProjectPro, enthusiasts delve into real-world scenarios, honing their skills by tackling diverse datasets and employing various techniques.

Download Materials

iPython Notebook

What Users are saying..

Savvy Sahai

Data Science Intern, Capgemini

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Medical Image Segmentation Deep Learning Project

In this deep learning project, you will learn to implement Unet++ models for medical image segmentation to detect and classify colorectal polyps.

View Project Details

PyTorch Project to Build a LSTM Text Classification Model

In this PyTorch Project you will learn how to build an LSTM Text Classification model for Classifying the Reviews of an App .

View Project Details

Build a Multi-Class Classification Model in Python on Saturn Cloud

In this machine learning classification project, you will build a multi-class classification model in Python on Saturn Cloud to predict the license status of a business.

View Project Details

Many-to-One LSTM for Sentiment Analysis and Text Generation

In this LSTM Project , you will build develop a sentiment detection model using many-to-one LSTMs for accurate prediction of sentiment labels in airline text reviews. Additionally, we will also train many-to-one LSTMs on 'Alice's Adventures in Wonderland' to generate contextually relevant text.

View Project Details

Build Portfolio Optimization Machine Learning Models in R

Machine Learning Project for Financial Risk Modelling and Portfolio Optimization with R- Build a machine learning model in R to develop a strategy for building a portfolio for maximized returns.

View Project Details

MLOps Project to Build Search Relevancy Algorithm with SBERT

In this MLOps SBERT project you will learn to build and deploy an accurate and scalable search algorithm on AWS using SBERT and ANNOY to enhance search relevancy in news articles.

View Project Details

Learn How to Build a Linear Regression Model in PyTorch

In this Machine Learning Project, you will learn how to build a simple linear regression model in PyTorch to predict the number of days subscribed.

View Project Details

Linear Regression Model Project in Python for Beginners Part 2

Machine Learning Linear Regression Project for Beginners in Python to Build a Multiple Linear Regression Model on Soccer Player Dataset.

View Project Details

OpenCV Project to Master Advanced Computer Vision Concepts

In this OpenCV project, you will learn to implement advanced computer vision concepts and algorithms in OpenCV library using Python.

View Project Details

A/B Testing Approach for Comparing Performance of ML Models

The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

View Project Details

How to Transform Categorical Features to Numerical Features?

Table of Contents

How to Convert Categorical Data to Numerical Data in Python?

Step 1 - Import the library

Step 2 - Setting up the Data

Step 3 - Converting the Values

Write a Python Program to Make Categorical Values in Numeric Format for a given dataset?

How to Convert Numerical Data to Categorical Data in Python?

Learn to Transform Categorical Features into Numerical Features with ProjectPro!

Savvy Sahai

Relevant Projects

You might also like

Relevant Projects