Recipe: One hot Encoding with nominal categorical features in Python?
DATA MUNGING ONEHOT ENCODING CATEGORICAL VARIABLE EXAMPLES

One hot Encoding with nominal categorical features in Python?

One hot Encoding with nominal categorical features in Python

In Machine Learning projects it is often required to convert categorical data text into numerical formats. Categorical variables are those that have a limited number of fixed values such as Country, Gender, Age etc. These are stored in a text format. Many machine learning models such as regression or SVM, are algebraic and need a numerical input. Before these learning algorithms can be used on a dataset, it has to be converted into numeric.

Hence these categorical values need to be converted to numeric. Variables where the categories are only labeled without any order of precedence are referred to as nominal features. The 2 most common ways to achieve this are: 1) Label Encoder 2) OneHot Encoder.

One-hot encoding in python takes a column that has categorical data and splits the column into multiple columns. It takes the repeated category values (for example - male, female, USA etc) in a column and groups them into just 1 column value. So any repetition of the category value will be indicated by a number.

In the above recipe example, the column values are names of US states - Texas, Delaware and California. First we create a label binarizer object. Then we fit and transform the array 'x' with the onehotencoder object we just created.

References: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

In [1]:
## One hot Encoding with nominal categorical features in Python 
def Kickstarter_Example_37():
    print()
    print(format('How to One hot Encode with nominal categorical features in Python', '*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # Load libraries
    import numpy as np
    from sklearn.preprocessing import LabelBinarizer

    # Create Data With One Class Label
    # Create NumPy array
    x = np.array([['Texas'],
                  ['California'],
                  ['Texas'],
                  ['Delaware'],
                  ['Texas']])

    # One-hot Encode Data (Method 1)

    # Create LabelBinzarizer object
    one_hot = LabelBinarizer()

    # One-hot encode data
    print(); print(one_hot.fit_transform(x))

    # View Column Headers
    # View classes
    print(); print(one_hot.classes_)

Kickstarter_Example_37()
********How to One hot Encode with nominal categorical features in Python*********

[[0 0 1]
 [1 0 0]
 [0 0 1]
 [0 1 0]
 [0 0 1]]

['California' 'Delaware' 'Texas']


Stuck at work?
Can't find the recipe you are looking for. Let us know and we will find an expert to create the recipe for you. Click here
Companies using this Recipe
1 developer from Career Launcher
1 developer from KPMG
1 developer from Thomson Reuters
1 developer from Alexandria.ai
1 developer from HCL
1 developer from LTI
1 developer from Vodafone
1 developer from Altimetrik
1 developer from HvH
1 developer from MudraCircle