How to convert string categorical variables into numerical variables using Label Encoder in python

This recipe helps you convert string categorical variables into numerical variables using Label Encoder in python
Last Updated: 19 Jan 2023

Get access to Data Science projects View all Data Science projects

DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

Many a times while working on a dataset we come across many features that does not have numerical values or which contains multiple labels. These features make the data more understandable and readable for us but the Machine Learning algorithms cannot work on categorical data.

For training and predicting using Machine Learning Algorithms, we have to change categorical data into numerical data and this can be done easily by Label Encoding.

This data science python source code does the following:
1. Convert categorical features into numerical.
2. Implementation of Label Encoding function.

So this recipe is a short example on how to convert categorical variables into numerical variables using Label Encoding. Let's get started.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Recipe Objective

Step 1 - Import the library - LabelEncoder

import pandas as pd from sklearn.preprocessing import LabelEncoder

Here we have imported Pandas and LabelEncoder which will be used to convert the categorical variables into numerical variables.

Step 2 - Setup the Data

city_data = {'city_level': [1, 3, 1, 2, 2, 3, 1, 1, 2, 3], 'city_pool' : ['y','y','n','y','n','n','y','n','n','y'], 'Rating': [1, 5, 3, 4, 1, 2, 3, 5, 3, 4], 'City_port': [0, 1, 0, 1, 0, 0, 1, 1, 0, 1], 'city_temperature': ['low', 'medium', 'medium', 'high', 'low','low', 'medium', 'medium', 'high', 'low']} df = pd.DataFrame(city_data, columns = ['city_level', 'city_pool', 'Rating', 'City_port', 'city_temperature'])

Let us create a simple dataset and convert it to a dataframe. This is a dataset of city with different features in it like City_level, City_pool, Rating, City_port and City_Temperature. We have converted this dataset into a dataframe with its features as columns.

Clearly, we can see that the features City_pool and City_Temperature have non numerical values. So these two features are categorical features.

Step 3 - Create a function for LabelEncoder

We have created a function named 'Encoder'. In which we will be selecting the columns having categorical values and will perform Label Encoding.

def Encoder(df): columnsToEncode = list(df.select_dtypes(include=['category','object'])) le = LabelEncoder() for feature in columnsToEncode: try: df[feature] = le.fit_transform(df[feature]) except: print('Error encoding '+feature) return df

Now Let us try to understand each statement of the function.
Initially in the function, we have created an object 'columnsToEncode' which will make a list of columns that have of categorical values i.e. the columns having data type 'category' or 'object'. columnsToEncode = list(df.select_dtypes(include=['category','object']))

Now we have to use LabelEncoder. So let us have a look on the parameters and the attributes which we need to pass.
There is one attribute and zero parameter for LabelEncoder. The attribute is:

classes_ : It is the array of labels or categorical values.

So, We have created an object for LabelEncoder with no parameters. le = LabelEncoder()

We have created a loop which will iterate over the columns from the list 'columnsToEncode'. In the loop we have used try and except function which consists of 2 blocks, 'try' and 'except'. It works in a manner that first statements inside the try block will execute and if it have some error then only except block will be executed. In the try block we have used the LabelEncoder fit_transform method with the attribute df[feature] and in the except block there is a print statement. for feature in columnsToEncode: try: df[feature] = le.fit_transform(df[feature]) except: print('Error encoding '+feature)

Now we have passed our dataframe through the function. df = Encoder(df)

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 4 - Lets look at our dataset now

Once we run the above code snippet, we will see that the categorical values in the features City_pool and City_Temperature have been converted into numberical values.

For example in City_Temperature:
low has been represented by 1, medium by 2 and high by 0.

print(df)

[['1.0' '3.0' '1.0' '2.0' '2.0' '3.0' '1.0' '1.0' '2.0' '3.0']
 ['1.0' '1.0' '0.0' '1.0' '0.0' '0.0' '1.0' '0.0' '0.0' '1.0']
 ['1.0' '5.0' '3.0' '4.0' '1.0' '2.0' '3.0' '5.0' '3.0' '4.0']
 ['0.0' '1.0' '0.0' '1.0' '0.0' '0.0' '1.0' '1.0' '0.0' '1.0']
 ['1.0' '2.0' '2.0' '0.0' '1.0' '1.0' '2.0' '2.0' '0.0' '1.0']]

Join Millions of Satisfied Developers and Enterprises to Maximize Your Productivity and ROI with ProjectPro - Read ProjectPro Reviews Now!

Download Materials

iPython Notebook

What Users are saying..

Ed Godalle

Director Data Analytics at EY / EY Tech

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Loan Eligibility Prediction in Python using H2O.ai

In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

View Project Details

Learn to Build a Neural network from Scratch using NumPy

In this deep learning project, you will learn to build a neural network from scratch using NumPy

View Project Details

Learn to Build Generative Models Using PyTorch Autoencoders

In this deep learning project, you will learn how to build a Generative Model using Autoencoders in PyTorch

View Project Details

Ecommerce product reviews - Pairwise ranking and sentiment analysis

This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

View Project Details

How to convert string categorical variables into numerical variables using Label Encoder in python

Recipe Objective

Table of Contents

Step 1 - Import the library - LabelEncoder

Step 2 - Setup the Data

Step 3 - Create a function for LabelEncoder

Step 4 - Lets look at our dataset now

Ed Godalle

Relevant Projects

You might also like

Relevant Projects