One hot Encoding with multiple labels in Python?

One hot Encoding with multiple labels in Python?

One hot Encoding with multiple labels in Python?

One hot Encoding with multiple labels in Python


Recipe Objective

In many datasets we find that there are multiple labels and machine learning model can not be trained on the labels. To solve this problem we may assign numbers to this labels but machine learning models can compare numbers and will give different weightage to different labels and as a result it will be bias towards a label. So what we can do is we can make different columns acconding to the labels and assign bool values in it.

This python source code does the following:
1. Converts categorical into numerical types.
2. Loads the important libraries and modules.
3. Implements multi label binarizer.
4. Creates your own numpy feature matrix.
5.Extracts and interprets the final result

So this is the recipe on how we can use MultiLabelBinarize to convert labels into bool values in Python.

Step 1 - Import the library

from sklearn.preprocessing import MultiLabelBinarizer

We have only imported MultiLabelBinarizer which is reqired to do so.

Step 2 - Setting up the Data

We have created a arrays of differnt labels with few of the labels in common. y = [('Raj', 'Penny'), ('Amy', 'Raj'), ('Sheldon', 'Penny'), ('Leonard', 'Amy'), ('Amy', 'Leonard')]

Step 3 - Using MultiLabelBinarizer and Printing Output

We have created an object for MultiLabelBinarizer and using fit_transform we have fitted and transformed our data. Finally we have printed the classes that has been make by the function. one_hot = MultiLabelBinarizer() print(one_hot.fit_transform(y)) print(one_hot.classes_) So the output comes as:

[[0 0 1 1 0]
 [1 0 0 1 0]
 [0 0 1 0 1]
 [1 1 0 0 0]
 [1 1 0 0 0]]

['Amy' 'Leonard' 'Penny' 'Raj' 'Sheldon']

Relevant Projects

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Human Activity Recognition Using Smartphones Data Set
In this deep learning project, you will build a classification system where to precisely identify human fitness activities.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.