How to convert string categorical variables into numerical variables in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to convert string categorical variables into numerical variables in Python?

How to convert string categorical variables into numerical variables in Python?

This recipe helps you convert string categorical variables into numerical variables in Python

0

Recipe Objective

Machine Learning Models can not work on categorical variables in the form of strings, so we need to change it into numerical form. This can be done by making new features according to the categories by assigning it values.

This python source code does the following:
1. Creates a data dictionary and converts it into pandas dataframe
2. Manually creates a encoding function
3. Applies the function on dataframe to encode the variable

So this is the recipe on how we can convert string categorical variables into numerical variables in Python.

Step 1 - Import the library

import pandas as pd

We have only imported pandas this is reqired for dataset.

Step 2 - Setting up the Data

We have created a dictionary and passed it through the pd.DataFrame to create a dataframe with columns 'name', 'episodes', 'gender'. data = {'name': ['Sheldon', 'Penny', 'Amy', 'Penny', 'Raj', 'Sheldon'], 'episodes': [42, 24, 31, 29, 37, 40], 'gender': ['male', 'female', 'female', 'female', 'male', 'male']} df = pd.DataFrame(data, columns = ['name','episodes', 'gender']) print(df)

Step 3 - Converting the values

We can clearly observe that in the column 'gender' there are two categories male and female, so for that we can assign number to each categories like 1 to male and 2 to female. def gender_to_numeric(x): if x=='female': return 2 if x=='male': return 1 df['gender_num'] = df['gender'].apply(gender_to_numeric) print(df) Here we are defining a function to assign numeric values and then we are applying of the feature 'gender'.
So the output comes as:

      name  episodes  gender
0  Sheldon        42    male
1    Penny        24  female
2      Amy        31  female
3    Penny        29  female
4      Raj        37    male
5  Sheldon        40    male

      name  episodes  gender  gender_num
0  Sheldon        42    male           1
1    Penny        24  female           2
2      Amy        31  female           2
3    Penny        29  female           2
4      Raj        37    male           1
5  Sheldon        40    male           1

Relevant Projects

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.