One hot Encoding with multiple labels in Python?

One hot Encoding with multiple labels in Python?

One hot Encoding with multiple labels in Python

In Machine Learning projects it is often required to convert categorical data text into numerical formats. Categorical variables are those that have a limited number of fixed values such as Country, Gender, Age etc. These are stored in a text format. Many machine learning models such as regression or SVM, are algebraic and need a numerical input. Before these learning algorithms can be used on a dataset, it has to be converted into numeric.

Hence these categorical values need to be converted to numeric. This is part of the exploratory data analysis (EDA) step in your machine learning project. Variables where the categories are only labeled without any order of precedence are referred to as nominal features. There are multiple ways to do this - Replacing values, Encoding labels, One-Hot encoding, Binary encoding, Backward difference encoding. The 2 most common ways to achieve this are: 1) Label Encoding 2) OneHot Encoding.

One-hot encoding in python takes a column that has categorical data and splits the column into multiple columns. It takes the repeated category values (for example - male, female, USA etc) in a column and groups them into just 1 column value. So any repetition of the category value will be indicated by a number.

In the above recipe example, the column values are names of US states - Texas, Florida, Alabama, Delaware and California. First we create a multilabelbinarizer object. Then we fit and transform the array 'y' with the onehotencoder object we just created.

References: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

In [1]:
## One hot Encoding with multiple labels in Python 
def Kickstarter_Example_36():
    print()
    print(format('How to do One hot Encode with multiple labels in Python', '*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # Load libraries
    from sklearn.preprocessing import MultiLabelBinarizer

    # Create NumPy array
    y = [('Texas', 'Florida'),
         ('California', 'Alabama'),
         ('Texas', 'Florida'),
         ('Delware', 'Florida'),
         ('Texas', 'Alabama')]

    # Create MultiLabelBinarizer object
    one_hot = MultiLabelBinarizer()

    # One-hot encode data
    print(); print(one_hot.fit_transform(y))

    # View Column Headers
    # View classes
    print(); print(one_hot.classes_)

Kickstarter_Example_36()
*************How to do One hot Encode with multiple labels in Python**************

[[0 0 0 1 1]
 [1 1 0 0 0]
 [0 0 0 1 1]
 [0 0 1 1 0]
 [1 0 0 0 1]]

['Alabama' 'California' 'Delware' 'Florida' 'Texas']


Stuck at work?
Can't find the recipe you are looking for. Let us know and we will find an expert to create the recipe for you. Click here
Companies using this Recipe
1 developer from HvH
1 developer from LTI
1 developer from Altimetrik
1 developer from IBM
1 developer from MudraCircle
1 developer from ANAC
1 developer from Infosys
1 developer from Vodafone
1 developer from Avensys
1 developer from KPMG