How to convert categorical variables into numerical variables in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to convert categorical variables into numerical variables in Python?

How to convert categorical variables into numerical variables in Python?

This recipe helps you convert categorical variables into numerical variables in Python

Recipe Objective

Machine Learning Models can not work on categorical variables in the form of strings, so we need to change it into numerical form. We can assign numbers for each categories but it may not be that effective when difference between the categories can not be measured. This can be done by making new features according to the categories with bool values. For this we will be using dummy variables to do so.

This python source code does the following:
1. Creates dictionary and converts it into dataframe
2. Uses "get_dummies" function for the encoding
3. Concats the final encoded dataset into the final dataframe
4. Drops categorical variable column

So this is the recipe on how we can convert categorical variables into numerical variables in Python.

Step 1 - Import the library

import pandas as pd

We have only imported pandas this is reqired for dataset.

Step 2 - Setting up the Data

We have created a dictionary and passed it through the pd.DataFrame to create a dataframe with columns 'name', 'episodes', 'gender'. data = {'name': ['Sheldon', 'Penny', 'Amy', 'Penny', 'Raj', 'Sheldon'], 'episodes': [42, 24, 31, 29, 37, 40], 'gender': ['male', 'female', 'female', 'female', 'male', 'male']} df = pd.DataFrame(data, columns = ['name','episodes', 'gender']) print(df)

Step 3 - Making Dummy Variables and Printing the final Dataset

We can clearly observe that in the column 'gender' there are two categories male and female, so for that column we have to make dummies according to the categories. So we have passed that column in the function and stored it in df_gender. Finally we have added that columns in out original dataset. df_gender = pd.get_dummies(df['gender']) df_new = pd.concat([df, df_gender], axis=1) print(df_new) So the output comes as:

      name  episodes  gender
0  Sheldon        42    male
1    Penny        24  female
2      Amy        31  female
3    Penny        29  female
4      Raj        37    male
5  Sheldon        40    male

      name  episodes  gender  female  male
0  Sheldon        42    male       0     1
1    Penny        24  female       1     0
2      Amy        31  female       1     0
3    Penny        29  female       1     0
4      Raj        37    male       0     1
5  Sheldon        40    male       0     1

Download Materials

Relevant Projects

Avocado Machine Learning Project Python for Price Prediction
In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.

Churn Prediction in Telecom using Machine Learning in R
Estimating churners before they discontinue using a product or service is extremely important. In this ML project, you will develop a churn prediction model in telecom to predict customers who are most likely subject to churn.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Expedia Hotel Recommendations Data Science Project
In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

Machine learning for Retail Price Recommendation with Python
Use the Mercari Dataset with dynamic pricing to build a price recommendation algorithm using machine learning in Python to automatically suggest the right product prices.

Build OCR from Scratch Python using YOLO and Tesseract
In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Abstractive Text Summarization using Transformers-BART Model
Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.

Image Segmentation using Mask R-CNN with Tensorflow
In this Deep Learning Project on Image Segmentation Python, you will learn how to implement the Mask R-CNN model for early fire detection.