How to convert Categorical features to Numerical Features in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to convert Categorical features to Numerical Features in Python?

How to convert Categorical features to Numerical Features in Python?

This recipe helps you convert Categorical features to Numerical Features in Python

0

Recipe Objective

Machine Learning Models can not work on categorical variables in the form of strings, so we need to change it into numerical form. This can be done by making new features according to the categories by assigning it values.

So this is the recipe on how we can convert Categorical features to Numerical Features in Python

Step 1 - Import the library

import pandas as pd

We have only imported pandas this is reqired for dataset.

Step 2 - Setting up the Data

We have created a dictionary and passed it through the pd.DataFrame to create a dataframe with columns "name", "episodes", "gender". data = {"name": ["Sheldon", "Penny", "Amy", "Penny", "Raj", "Sheldon"], "episodes": [42, 24, 31, 29, 37, 40], "gender": ["male", "female", "female", "female", "male", "male"]} df = pd.DataFrame(data, columns = ["name","episodes", "gender"]) print(df)

Step 3 - Converting the values

We can clearly observe that in the column "gender" there are two categories male and female, so for that we can assign number to each categories like 1 to male and 2 to female. Now we are using LabelEncoder. We have first fitted the feature and transformed it. le = preprocessing.LabelEncoder() le.fit(df["gender"]) print(); print(list(le.classes_)) print(); print(le.transform(df["gender"])) So the output comes as:

Feature Matrix:
   Feature 1  Feature 2  Feature 3  Feature 4  Feature 5  Feature 6  
0  -1.867524   1.745983   2.952435  -0.177492  -3.088648   1.762311
1   0.450144  -2.106431  -1.065847  -1.958231  -0.451780  -1.990662
2  -4.647836  -4.214226  -1.830341  -1.714825  -6.590249  -0.315993
3   1.958901  -1.313546   1.409145  -2.069271   1.508912   3.774923
4   2.001750   0.879350  -2.041154   1.917629  -0.760137   1.310228

   Feature 7  Feature 8  Feature 9  Feature 10
0  -0.195266   1.029769   2.814171    0.071059
1  -2.530104  -1.377802  -0.013353   -2.849859
2   2.780038  -3.325841  -4.008319    2.001941
3   5.012315  -5.772415  -0.818187   -0.392333
4   0.671990   1.444606  -1.731576   -0.378597

Target Class:
   TargetClass
0            1
1            2
2            1
3            0
4            0

Relevant Projects

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.