How to convert string categorical variables into numerical variables in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to convert string categorical variables into numerical variables in Python?

How to convert string categorical variables into numerical variables in Python?

This recipe helps you convert string categorical variables into numerical variables in Python

0
This python source code does the following: 1. Creates a data dictionary and converts it into pandas dataframe 2. Manually creates a encoding function 3. Applies the function on dataframe to encode the variable
In [1]:
## How to convert string categorical variables into numerical variables in Python
def Kickstarter_Example_77():
    print()
    print(format('How to convert strings into numerical variables in Python','*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # load libraries
    import pandas as pd

    # Create dataframe
    raw_data = {'patient': [1, 1, 1, 2, 2],
                'obs': [1, 2, 3, 1, 2],
                'treatment': [0, 1, 0, 1, 0],
                'score': ['strong', 'weak', 'normal', 'weak', 'strong']}

    df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])
    print(); print(df)

    # Create a function that converts all values of df['score'] into numbers
    def score_to_numeric(x):
        if x=='strong': return 3
        if x=='normal': return 2
        if x=='weak':   return 1

    # Apply the function to the score variable
    df['score_num'] = df['score'].apply(score_to_numeric)
    print(); print(df)

Kickstarter_Example_77()
************How to convert strings into numerical variables in Python*************

   patient  obs  treatment   score
0        1    1          0  strong
1        1    2          1    weak
2        1    3          0  normal
3        2    1          1    weak
4        2    2          0  strong

   patient  obs  treatment   score  score_num
0        1    1          0  strong          3
1        1    2          1    weak          1
2        1    3          0  normal          2
3        2    1          1    weak          1
4        2    2          0  strong          3

Relevant Projects

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.