How to convert Categorical features to Numerical Features in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to convert Categorical features to Numerical Features in Python?

How to convert Categorical features to Numerical Features in Python?

This recipe helps you convert Categorical features to Numerical Features in Python

0

Recipe Objective

Machine Learning Models can not work on categorical variables in the form of strings, so we need to change it into numerical form. This can be done by making new features according to the categories by assigning it values.

So this is the recipe on how we can convert Categorical features to Numerical Features in Python

Step 1 - Import the library

import pandas as pd

We have only imported pandas this is reqired for dataset.

Step 2 - Setting up the Data

We have created a dictionary and passed it through the pd.DataFrame to create a dataframe with columns "name", "episodes", "gender". data = {"name": ["Sheldon", "Penny", "Amy", "Penny", "Raj", "Sheldon"], "episodes": [42, 24, 31, 29, 37, 40], "gender": ["male", "female", "female", "female", "male", "male"]} df = pd.DataFrame(data, columns = ["name","episodes", "gender"]) print(df)

Step 3 - Converting the values

We can clearly observe that in the column "gender" there are two categories male and female, so for that we can assign number to each categories like 1 to male and 2 to female. Now we are using LabelEncoder. We have first fitted the feature and transformed it. le = preprocessing.LabelEncoder() le.fit(df["gender"]) print(); print(list(le.classes_)) print(); print(le.transform(df["gender"])) So the output comes as:

Feature Matrix:
   Feature 1  Feature 2  Feature 3  Feature 4  Feature 5  Feature 6  
0  -1.867524   1.745983   2.952435  -0.177492  -3.088648   1.762311
1   0.450144  -2.106431  -1.065847  -1.958231  -0.451780  -1.990662
2  -4.647836  -4.214226  -1.830341  -1.714825  -6.590249  -0.315993
3   1.958901  -1.313546   1.409145  -2.069271   1.508912   3.774923
4   2.001750   0.879350  -2.041154   1.917629  -0.760137   1.310228

   Feature 7  Feature 8  Feature 9  Feature 10
0  -0.195266   1.029769   2.814171    0.071059
1  -2.530104  -1.377802  -0.013353   -2.849859
2   2.780038  -3.325841  -4.008319    2.001941
3   5.012315  -5.772415  -0.818187   -0.392333
4   0.671990   1.444606  -1.731576   -0.378597

Target Class:
   TargetClass
0            1
1            2
2            1
3            0
4            0

Relevant Projects

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.