Recipe: One hot Encoding with nominal categorical features in Python?
DATA MUNGINGONEHOT ENCODINGCATEGORICAL VARIABLE EXAMPLES
One hot Encoding with nominal categorical features in Python?
One hot Encoding with nominal categorical features in Python
In Machine Learning projects it is often required to convert categorical data text into numerical formats. Categorical variables are those that have a limited number of fixed values such as Country, Gender, Age etc. These are stored in a text format. Many machine learning models such as regression or SVM, are algebraic and need a numerical input. Before these learning algorithms can be used on a dataset, it has to be converted into numeric.
Hence these categorical values need to be converted to numeric. Variables where the categories are only labeled without any order of precedence are referred to as nominal features. The 2 most common ways to achieve this are: 1) Label Encoder 2) OneHot Encoder.
One-hot encoding in python takes a column that has categorical data and splits the column into multiple columns. It takes the repeated category values (for example - male, female, USA etc) in a column and groups them into just 1 column value. So any repetition of the category value will be indicated by a number.
In the above recipe example, the column values are names of US states - Texas, Delaware and California. First we create a label binarizer object. Then we fit and transform the array 'x' with the onehotencoder object we just created.
## One hot Encoding with nominal categorical features in Python defKickstarter_Example_37():print()print(format('How to One hot Encode with nominal categorical features in Python','*^82'))importwarningswarnings.filterwarnings("ignore")# Load librariesimportnumpyasnpfromsklearn.preprocessingimportLabelBinarizer# Create Data With One Class Label# Create NumPy arrayx=np.array([['Texas'],['California'],['Texas'],['Delaware'],['Texas']])# One-hot Encode Data (Method 1)# Create LabelBinzarizer objectone_hot=LabelBinarizer()# One-hot encode dataprint();print(one_hot.fit_transform(x))# View Column Headers# View classesprint();print(one_hot.classes_)Kickstarter_Example_37()
********How to One hot Encode with nominal categorical features in Python*********
[[0 0 1]
[1 0 0]
[0 0 1]
[0 1 0]
[0 0 1]]
['California' 'Delaware' 'Texas']
Stuck at work?
Can't find the recipe you are looking for. Let us know and we will find an expert to create the recipe for you.