How to convert string categorical variables into numerical variables using Label Encoder in python

This recipe helps you convert string categorical variables into numerical variables using Label Encoder in python

Recipe Objective

Many a times while working on a dataset we come across many features that does not have numerical values or which contains multiple labels. These features make the data more understandable and readable for us but the Machine Learning algorithms cannot work on categorical data.

For training and predicting using Machine Learning Algorithms, we have to change categorical data into numerical data and this can be done easily by Label Encoding.

This data science python source code does the following:
1. Convert categorical features into numerical.
2. Implementation of Label Encoding function.

So this recipe is a short example on how to convert categorical variables into numerical variables using Label Encoding. Let's get started.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Step 1 - Import the library - LabelEncoder

import pandas as pd from sklearn.preprocessing import LabelEncoder

Here we have imported Pandas and LabelEncoder which will be used to convert the categorical variables into numerical variables.

Step 2 - Setup the Data

city_data = {'city_level': [1, 3, 1, 2, 2, 3, 1, 1, 2, 3], 'city_pool' : ['y','y','n','y','n','n','y','n','n','y'], 'Rating': [1, 5, 3, 4, 1, 2, 3, 5, 3, 4], 'City_port': [0, 1, 0, 1, 0, 0, 1, 1, 0, 1], 'city_temperature': ['low', 'medium', 'medium', 'high', 'low','low', 'medium', 'medium', 'high', 'low']} df = pd.DataFrame(city_data, columns = ['city_level', 'city_pool', 'Rating', 'City_port', 'city_temperature'])

Let us create a simple dataset and convert it to a dataframe. This is a dataset of city with different features in it like City_level, City_pool, Rating, City_port and City_Temperature. We have converted this dataset into a dataframe with its features as columns.

Clearly, we can see that the features City_pool and City_Temperature have non numerical values. So these two features are categorical features.

Step 3 - Create a function for LabelEncoder

We have created a function named 'Encoder'. In which we will be selecting the columns having categorical values and will perform Label Encoding.

def Encoder(df): columnsToEncode = list(df.select_dtypes(include=['category','object'])) le = LabelEncoder() for feature in columnsToEncode: try: df[feature] = le.fit_transform(df[feature]) except: print('Error encoding '+feature) return df

Now Let us try to understand each statement of the function.
Initially in the function, we have created an object 'columnsToEncode' which will make a list of columns that have of categorical values i.e. the columns having data type 'category' or 'object'. columnsToEncode = list(df.select_dtypes(include=['category','object']))

Now we have to use LabelEncoder. So let us have a look on the parameters and the attributes which we need to pass.
There is one attribute and zero parameter for LabelEncoder. The attribute is:

  • classes_ : It is the array of labels or categorical values.

So, We have created an object for LabelEncoder with no parameters. le = LabelEncoder()

We have created a loop which will iterate over the columns from the list 'columnsToEncode'. In the loop we have used try and except function which consists of 2 blocks, 'try' and 'except'. It works in a manner that first statements inside the try block will execute and if it have some error then only except block will be executed. In the try block we have used the LabelEncoder fit_transform method with the attribute df[feature] and in the except block there is a print statement. for feature in columnsToEncode: try: df[feature] = le.fit_transform(df[feature]) except: print('Error encoding '+feature)

Now we have passed our dataframe through the function. df = Encoder(df)

 

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 4 - Lets look at our dataset now

Once we run the above code snippet, we will see that the categorical values in the features City_pool and City_Temperature have been converted into numberical values.

For example in City_Temperature:
low has been represented by 1, medium by 2 and high by 0.

print(df)

[['1.0' '3.0' '1.0' '2.0' '2.0' '3.0' '1.0' '1.0' '2.0' '3.0']
 ['1.0' '1.0' '0.0' '1.0' '0.0' '0.0' '1.0' '0.0' '0.0' '1.0']
 ['1.0' '5.0' '3.0' '4.0' '1.0' '2.0' '3.0' '5.0' '3.0' '4.0']
 ['0.0' '1.0' '0.0' '1.0' '0.0' '0.0' '1.0' '1.0' '0.0' '1.0']
 ['1.0' '2.0' '2.0' '0.0' '1.0' '1.0' '2.0' '2.0' '0.0' '1.0']]

Join Millions of Satisfied Developers and Enterprises to Maximize Your Productivity and ROI with ProjectPro - Read ProjectPro Reviews Now!

Download Materials

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

AWS MLOps Project to Deploy a Classification Model [Banking]
In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

Loan Eligibility Prediction in Python using H2O.ai
In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

CycleGAN Implementation for Image-To-Image Translation
In this GAN Deep Learning Project, you will learn how to build an image to image translation model in PyTorch with Cycle GAN.

Build Classification Algorithms for Digital Transformation[Banking]
Implement a machine learning approach using various classification techniques in Python to examine the digitalisation process of bank customers.

Build a CNN Model with PyTorch for Image Classification
In this deep learning project, you will learn how to build an Image Classification Model using PyTorch CNN

Build an AI Chatbot from Scratch using Keras Sequential Model
In this NLP Project, you will learn how to build an AI Chatbot from Scratch using Keras Sequential Model.

GCP MLOps Project to Deploy ARIMA Model using uWSGI Flask
Build an end-to-end MLOps Pipeline to deploy a Time Series ARIMA Model on GCP using uWSGI and Flask

LLM Project to Build and Fine Tune a Large Language Model
In this LLM project for beginners, you will learn to build a knowledge-grounded chatbot using LLM's and learn how to fine tune it.

BigMart Sales Prediction ML Project in Python
The goal of the BigMart Sales Prediction ML project is to build and evaluate different predictive models and determine the sales of each product at a store.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.