How to convert string categorical variables into numerical variables in Python?

This recipe helps you convert string categorical variables into numerical variables in Python

Recipe Objective

Machine Learning Models can not work on categorical variables in the form of strings, so we need to change it into numerical form. This can be done by making new features according to the categories by assigning it values.

This python source code does the following:
1. Creates a data dictionary and converts it into pandas dataframe
2. Manually creates a encoding function
3. Applies the function on dataframe to encode the variable

So this is the recipe on how we can convert string categorical variables into numerical variables in Python.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Step 1 - Import the library

import pandas as pd

We have only imported pandas this is reqired for dataset.

Step 2 - Setting up the Data

We have created a dictionary and passed it through the pd.DataFrame to create a dataframe with columns 'name', 'episodes', 'gender'. data = {'name': ['Sheldon', 'Penny', 'Amy', 'Penny', 'Raj', 'Sheldon'], 'episodes': [42, 24, 31, 29, 37, 40], 'gender': ['male', 'female', 'female', 'female', 'male', 'male']} df = pd.DataFrame(data, columns = ['name','episodes', 'gender']) print(df)

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 3 - Converting the values

We can clearly observe that in the column 'gender' there are two categories male and female, so for that we can assign number to each categories like 1 to male and 2 to female. def gender_to_numeric(x): if x=='female': return 2 if x=='male': return 1 df['gender_num'] = df['gender'].apply(gender_to_numeric) print(df) Here we are defining a function to assign numeric values and then we are applying of the feature 'gender'. So the output comes as:

      name  episodes  gender
0  Sheldon        42    male
1    Penny        24  female
2      Amy        31  female
3    Penny        29  female
4      Raj        37    male
5  Sheldon        40    male

      name  episodes  gender  gender_num
0  Sheldon        42    male           1
1    Penny        24  female           2
2      Amy        31  female           2
3    Penny        29  female           2
4      Raj        37    male           1
5  Sheldon        40    male           1

Download Materials

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Deploying Machine Learning Models with Flask for Beginners
In this MLOps on GCP project you will learn to deploy a sales forecasting ML Model using Flask.

Build a Text Generator Model using Amazon SageMaker
In this Deep Learning Project, you will train a Text Generator Model on Amazon Reviews Dataset using LSTM Algorithm in PyTorch and deploy it on Amazon SageMaker.

Build a Customer Churn Prediction Model using Decision Trees
Develop a customer churn prediction model using decision tree machine learning algorithms and data science on streaming service data.

Build Classification Algorithms for Digital Transformation[Banking]
Implement a machine learning approach using various classification techniques in Python to examine the digitalisation process of bank customers.

Build a Multi-Class Classification Model in Python on Saturn Cloud
In this machine learning classification project, you will build a multi-class classification model in Python on Saturn Cloud to predict the license status of a business.

PyTorch Project to Build a LSTM Text Classification Model
In this PyTorch Project you will learn how to build an LSTM Text Classification model for Classifying the Reviews of an App .

Learn to Build an End-to-End Machine Learning Pipeline - Part 2
In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, incorporating Hopsworks' feature store and Weights and Biases for model experimentation.

End-to-End Snowflake Healthcare Analytics Project on AWS-1
In this Snowflake Healthcare Analytics Project, you will leverage Snowflake on AWS to predict patient length of stay (LOS) in hospitals. The prediction of LOS can help in efficient resource allocation, lower the risk of staff/visitor infections, and improve overall hospital functioning.

Build a Graph Based Recommendation System in Python-Part 2
In this Graph Based Recommender System Project, you will build a recommender system project for eCommerce platforms and learn to use FAISS for efficient similarity search.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.