How to convert Categorical features to Numerical Features in Python?

How to convert Categorical features to Numerical Features in Python?

How to convert Categorical features to Numerical Features in Python?

This recipe helps you convert Categorical features to Numerical Features in Python


Recipe Objective

Machine Learning Models can not work on categorical variables in the form of strings, so we need to change it into numerical form. This can be done by making new features according to the categories by assigning it values.

So this is the recipe on how we can convert Categorical features to Numerical Features in Python

Step 1 - Import the library

import pandas as pd

We have only imported pandas this is reqired for dataset.

Step 2 - Setting up the Data

We have created a dictionary and passed it through the pd.DataFrame to create a dataframe with columns "name", "episodes", "gender". data = {"name": ["Sheldon", "Penny", "Amy", "Penny", "Raj", "Sheldon"], "episodes": [42, 24, 31, 29, 37, 40], "gender": ["male", "female", "female", "female", "male", "male"]} df = pd.DataFrame(data, columns = ["name","episodes", "gender"]) print(df)

Step 3 - Converting the values

We can clearly observe that in the column "gender" there are two categories male and female, so for that we can assign number to each categories like 1 to male and 2 to female. Now we are using LabelEncoder. We have first fitted the feature and transformed it. le = preprocessing.LabelEncoder()["gender"]) print(); print(list(le.classes_)) print(); print(le.transform(df["gender"])) So the output comes as:

Feature Matrix:
   Feature 1  Feature 2  Feature 3  Feature 4  Feature 5  Feature 6  
0  -1.867524   1.745983   2.952435  -0.177492  -3.088648   1.762311
1   0.450144  -2.106431  -1.065847  -1.958231  -0.451780  -1.990662
2  -4.647836  -4.214226  -1.830341  -1.714825  -6.590249  -0.315993
3   1.958901  -1.313546   1.409145  -2.069271   1.508912   3.774923
4   2.001750   0.879350  -2.041154   1.917629  -0.760137   1.310228

   Feature 7  Feature 8  Feature 9  Feature 10
0  -0.195266   1.029769   2.814171    0.071059
1  -2.530104  -1.377802  -0.013353   -2.849859
2   2.780038  -3.325841  -4.008319    2.001941
3   5.012315  -5.772415  -0.818187   -0.392333
4   0.671990   1.444606  -1.731576   -0.378597

Target Class:
0            1
1            2
2            1
3            0
4            0

Relevant Projects

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

Loan Eligibility Prediction in Python using
In this loan prediction project you will build predictive models in Python using to predict if an applicant is able to repay the loan or not.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

NLP and Deep Learning For Fake News Classification in Python
In this project you will use Python to implement various machine learning methods( RNN, LSTM, GRU) for fake news classification.

Expedia Hotel Recommendations Data Science Project
In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.