How to Transform Categorical Features to Numerical Features?

This Python code example will help you understand the process of transforming categorical features to numerical features. | ProjectPro

Many machine learning algorithms, such as linear regression, logistic regression, and support vector machines, require numerical input features. Transforming categorical features into numerical features allows these algorithms to process the data effectively. Most machine learning models perform better when trained on numerical features rather than categorical ones. Numeric representations allow the models to learn more complex patterns and relationships in the data, improving performance and predictive accuracy. So, this guide is here to help you understand the essential techniques to seamlessly transform categorical features into numeric form and understand data preprocessing with confidence and precision.

How to Convert Categorical Data to Numerical Data in Python? 

The conversion of categorical data to numerical form is a crucial preprocessing step in many machine learning tasks, enabling algorithms to process non-numeric information. Check out the steps below to understand the complete process using an example - 

Step 1 - Import the library

 

    import pandas as pd 

We have only imported pandas; this is required for the dataset.

Step 2 - Setting up the Data

We have created a dictionary and passed it through the pd.DataFrame to create a dataframe with columns "name," "episodes," and "gender."   

    data = {"name": ["Sheldon", "Penny", "Amy", "Penny", "Raj", "Sheldon"],

                "episodes": [42, 24, 31, 29, 37, 40],

                "gender": ["male", "female", "female", "female", "male", "male"]}

 

    df = pd.DataFrame(data, columns = ["name","episodes", "gender"])

    print(df)

Step 3 - Converting the Values

We can clearly observe that in the column "gender" there are two categories male and female, so we can assign numbers to each category like 1 to male and 2 to female. Now, we are using LabelEncoder. We have first fitted the feature and transformed it.

    le = preprocessing.LabelEncoder()

    le.fit(df["gender"])

    print(); print(list(le.classes_))

    print(); print(le.transform(df["gender"])) 

So the output comes as:

Feature Matrix:

   Feature 1  Feature 2  Feature 3  Feature 4  Feature 5  Feature 6  

0  -1.867524   1.745983   2.952435  -0.177492  -3.088648   1.762311

1   0.450144  -2.106431  -1.065847  -1.958231  -0.451780  -1.990662

2  -4.647836  -4.214226  -1.830341  -1.714825  -6.590249  -0.315993

3   1.958901  -1.313546   1.409145  -2.069271   1.508912   3.774923

4   2.001750   0.879350  -2.041154   1.917629  -0.760137   1.310228

 

   Feature 7  Feature 8  Feature 9  Feature 10

0  -0.195266   1.029769   2.814171    0.071059

1  -2.530104  -1.377802  -0.013353   -2.849859

2   2.780038  -3.325841  -4.008319    2.001941

3   5.012315  -5.772415  -0.818187   -0.392333

4   0.671990   1.444606  -1.731576   -0.378597

 

Target Class:

   TargetClass

0            1

1            2

2            1

3            0

4            0

 

Write a Python Program to Make Categorical Values in Numeric Format for a given dataset?

Let’s write a python program that uses the LabelEncoder from the scikit-learn library to convert categorical values in a dataset into numeric format:- 

 

Python program to make categorical values in Numeric format



Python program output

 

This program inputs a sample dataset, converts categorical columns ('Name', 'Gender', and 'City') into numeric format using the LabelEncoder, and then displays the modified DataFrame. You can replace the sample dataset with your own by reading it from a file or any other source.

How to Convert Numerical Data to Categorical Data in Python? 

Converting numerical data to categorical data in Python can be done using various methods, such as binning, one-hot encoding, label encoding, or custom functions. Here's a detailed explanation of each method:- 

 

  1. Binning 

Binning involves dividing numerical data into bins or intervals and assigning labels to each bin. This method is proper when converting continuous data into categorical data. 

 

Binning Example

 

  1. One-hot encoding

One-hot encoding creates binary columns for each category. It is commonly used when the categorical data is nominal (no inherent order).

 

One-hot Encoding Example

 

  1. Label Encoding 

Label encoding assigns a unique integer to each category. It is suitable when the categorical data has an ordinal relationship.

 

Label Encoding Example

Learn to Transform Categorical Features into Numerical Features with ProjectPro! 

These practical examples have helped you understand the importance of selecting the most suitable transformation method based on the nature of our data. While one-hot encoding preserves all categories but can lead to high dimensionality, label encoding simplifies the process, potentially introducing unintended ordinal relationships. Target encoding, conversely, showcases promise in capturing categorical-variable relationships with the target variable, though caution is warranted to prevent overfitting and data leakage. ProjectPro facilitates hands-on learning experiences crucial for mastering such concepts. Through ProjectPro, enthusiasts delve into real-world scenarios, honing their skills by tackling diverse datasets and employing various techniques. 

Download Materials

What Users are saying..

profile image

Ameeruddin Mohammed

ETL (Abintio) developer at IBM
linkedin profile url

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

CycleGAN Implementation for Image-To-Image Translation
In this GAN Deep Learning Project, you will learn how to build an image to image translation model in PyTorch with Cycle GAN.

PyCaret Project to Build and Deploy an ML App using Streamlit
In this PyCaret Project, you will build a customer segmentation model with PyCaret and deploy the machine learning application using Streamlit.

Predictive Analytics Project for Working Capital Optimization
In this Predictive Analytics Project, you will build a model to accurately forecast the timing of customer and supplier payments for optimizing working capital.

Azure Text Analytics for Medical Search Engine Deployment
Microsoft Azure Project - Use Azure text analytics cognitive service to deploy a machine learning model into Azure Databricks

A/B Testing Approach for Comparing Performance of ML Models
The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

Build a Graph Based Recommendation System in Python-Part 2
In this Graph Based Recommender System Project, you will build a recommender system project for eCommerce platforms and learn to use FAISS for efficient similarity search.

GCP MLOps Project to Deploy ARIMA Model using uWSGI Flask
Build an end-to-end MLOps Pipeline to deploy a Time Series ARIMA Model on GCP using uWSGI and Flask

Build a Review Classification Model using Gated Recurrent Unit
In this Machine Learning project, you will build a classification model in python to classify the reviews of an app on a scale of 1 to 5 using Gated Recurrent Unit.

AWS MLOps Project to Deploy a Classification Model [Banking]
In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

Recommender System Machine Learning Project for Beginners-1
Recommender System Machine Learning Project for Beginners - Learn how to design, implement and train a rule-based recommender system in Python