In many datasets we find that there are multiple labels and machine learning model can not be trained on the labels. To solve this problem we may assign numbers to this labels but machine learning models can compare numbers and will give different weightage to different labels and as a result it will be bias towards a label. So what we can do is we can make different columns acconding to the labels and assign bool values in it.
This python source code does the following:
1. Converts categorical into numerical types.
2. Loads the important libraries and modules.
3. Implements multi label binarizer.
4. Creates your own numpy feature matrix.
5.Extracts and interprets the final result
So this is the recipe on how we can use MultiLabelBinarize to convert labels into bool values in Python.
from sklearn.preprocessing import MultiLabelBinarizer
We have only imported MultiLabelBinarizer which is reqired to do so.
We have created a arrays of differnt labels with few of the labels in common.
y = [('Raj', 'Penny'),
We have created an object for MultiLabelBinarizer and using fit_transform we have fitted and transformed our data. Finally we have printed the classes that has been make by the function.
one_hot = MultiLabelBinarizer()
So the output comes as:
[[0 0 1 1 0] [1 0 0 1 0] [0 0 1 0 1] [1 1 0 0 0] [1 1 0 0 0]] ['Amy' 'Leonard' 'Penny' 'Raj' 'Sheldon']