How to process categorical features in Python?

How to process categorical features in Python?

How to process categorical features in Python?

This recipe helps you process categorical features in Python


Recipe Objective

Machine Learning Models can not work on categorical variables in the form of strings, so we need to change it into numerical form. We can assign numbers for each categories but it may not be that effective when difference between the categories can not be measured. This can be done by making new features according to the categories with bool values. For this we will be using dummy variables to do so.

So this is the recipe on how we can process categorical features in Python .

Step 1 - Importing Library

from sklearn import preprocessing import pandas as pd

We have only imported pandas and preprocessing which is needed.

Step 2 - Creating DataFrame

We have created a Dictionary and passed it through pd.DataFrame to create dataframe with different features. raw_data = {"first_name": ["Jason", "Molly", "Tina", "Jake", "Amy"], "last_name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze"], "age": [42, 52, 36, 24, 73], "city": ["San Francisco", "Baltimore", "Miami", "Douglas", "Boston"]} df = pd.DataFrame(raw_data, columns = ["first_name", "last_name", "age", "city"]) print(df)

Step 3 - Processing Categorical variables

We have first made the dummy variables with binary values for the categorical variable in feature city. Then we have used label encoder to fit and transform the data. print(pd.get_dummies(df["city"])) integerized_data = preprocessing.LabelEncoder().fit_transform(df["city"]) print(integerized_data) So the output comes as

  first_name last_name  age           city
0      Jason    Miller   42  San Francisco
1      Molly  Jacobson   52      Baltimore
2       Tina       Ali   36          Miami
3       Jake    Milner   24        Douglas
4        Amy     Cooze   73         Boston

   Baltimore  Boston  Douglas  Miami  San Francisco
0          0       0        0      0              1
1          1       0        0      0              0
2          0       0        0      1              0
3          0       0        1      0              0
4          0       1        0      0              0

[4 0 3 2 1]

Relevant Projects

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.