One hot Encoding with multiple labels in Python?

One hot Encoding with multiple labels in Python?

One hot Encoding with multiple labels in Python?

One hot Encoding with multiple labels in Python


Recipe Objective

In many datasets we find that there are multiple labels and machine learning model can not be trained on the labels. To solve this problem we may assign numbers to this labels but machine learning models can compare numbers and will give different weightage to different labels and as a result it will be bias towards a label. So what we can do is we can make different columns acconding to the labels and assign bool values in it.

This python source code does the following:
1. Converts categorical into numerical types.
2. Loads the important libraries and modules.
3. Implements multi label binarizer.
4. Creates your own numpy feature matrix.
5.Extracts and interprets the final result

So this is the recipe on how we can use MultiLabelBinarize to convert labels into bool values in Python.

Step 1 - Import the library

from sklearn.preprocessing import MultiLabelBinarizer

We have only imported MultiLabelBinarizer which is reqired to do so.

Step 2 - Setting up the Data

We have created a arrays of differnt labels with few of the labels in common. y = [('Raj', 'Penny'), ('Amy', 'Raj'), ('Sheldon', 'Penny'), ('Leonard', 'Amy'), ('Amy', 'Leonard')]

Step 3 - Using MultiLabelBinarizer and Printing Output

We have created an object for MultiLabelBinarizer and using fit_transform we have fitted and transformed our data. Finally we have printed the classes that has been make by the function. one_hot = MultiLabelBinarizer() print(one_hot.fit_transform(y)) print(one_hot.classes_) So the output comes as:

[[0 0 1 1 0]
 [1 0 0 1 0]
 [0 0 1 0 1]
 [1 1 0 0 0]
 [1 1 0 0 0]]

['Amy' 'Leonard' 'Penny' 'Raj' 'Sheldon']

Relevant Projects

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.