One hot Encoding with multiple labels in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

One hot Encoding with multiple labels in Python?

One hot Encoding with multiple labels in Python?

One hot Encoding with multiple labels in Python

0

Recipe Objective

In many datasets we find that there are multiple labels and machine learning model can not be trained on the labels. To solve this problem we may assign numbers to this labels but machine learning models can compare numbers and will give different weightage to different labels and as a result it will be bias towards a label. So what we can do is we can make different columns acconding to the labels and assign bool values in it.

This python source code does the following:
1. Converts categorical into numerical types.
2. Loads the important libraries and modules.
3. Implements multi label binarizer.
4. Creates your own numpy feature matrix.
5.Extracts and interprets the final result

So this is the recipe on how we can use MultiLabelBinarize to convert labels into bool values in Python.

Step 1 - Import the library

from sklearn.preprocessing import MultiLabelBinarizer

We have only imported MultiLabelBinarizer which is reqired to do so.

Step 2 - Setting up the Data

We have created a arrays of differnt labels with few of the labels in common. y = [('Raj', 'Penny'), ('Amy', 'Raj'), ('Sheldon', 'Penny'), ('Leonard', 'Amy'), ('Amy', 'Leonard')]

Step 3 - Using MultiLabelBinarizer and Printing Output

We have created an object for MultiLabelBinarizer and using fit_transform we have fitted and transformed our data. Finally we have printed the classes that has been make by the function. one_hot = MultiLabelBinarizer() print(one_hot.fit_transform(y)) print(one_hot.classes_) So the output comes as:

[[0 0 1 1 0]
 [1 0 0 1 0]
 [0 0 1 0 1]
 [1 1 0 0 0]
 [1 1 0 0 0]]

['Amy' 'Leonard' 'Penny' 'Raj' 'Sheldon']

Relevant Projects

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Build a Similar Images Finder with Python, Keras, and Tensorflow
Build your own image similarity application using Python to search and find images of products that are similar to any given product. You will implement the K-Nearest Neighbor algorithm to find products with maximum similarity.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Census Income Data Set Project - Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based on census data.

Loan Eligibility Prediction in Python using H2O.ai
In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.