How to create a new column based on a condition in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to create a new column based on a condition in Python?

How to create a new column based on a condition in Python?

This recipe helps you create a new column based on a condition in Python

Recipe Objective

Adding a new column in python is a easy task. But have you tried to add a column with values in it based on some condition. Like a column with values which depends on the values of another column. For a small data set with few numbers of rows it may be easy to do it manually but for a large dataset with hundreds of rows it may be quite difficult to do it manually.

We can do this hectic manual work with few lines of code. We can create a function which will do it for us for all the rows.

So this recipe is a short example of how can create a new column based on a condition in Python.

Step 1 - Import the library

import pandas as pd import numpy as np

We have imported pandas and numpy. No other library is needed for the this function.

Step 2 - Creating a sample Dataset

Here we have created a Dataframe with columns. We have used a print statement to view our initial dataset. data = {"name": ["Jason", "Molly", "Tina", "Jake", "Amy"], "age": [42, 52, 63, 24, 73], "preTestScore": [4, 24, 31, 2, 3], "postTestScore": [25, 94, 57, 62, 70]} print(df) df = pd.DataFrame(data, columns = ["name", "age", "preTestScore", "postTestScore"]) print(); print(df)

Step 3 - Creating a new column

We are building condition for making new columns.

  • If the value of age is greater then 50 then print yes in column elderly@50
  • If the value of age is greater then 60 then print yes in column elderly@60
  • If the value of age is greater then 70 then print yes in column elderly@70
df["elderly@50"] = np.where(df["age"]>=50, "yes", "no") df["elderly@60"] = np.where(df["age"]>=60, "yes", "no") df["elderly@70"] = np.where(df["age"]>=70, "yes", "no") print(df) As an output we get:

    name  age  preTestScore  postTestScore
0  Jason   42             4             25
1  Molly   52            24             94
2   Tina   63            31             57
3   Jake   24             2             62
4    Amy   73             3             70

    name  age  preTestScore  postTestScore elderly@50 elderly@60 elderly@70
0  Jason   42             4             25         no         no         no
1  Molly   52            24             94        yes         no         no
2   Tina   63            31             57        yes        yes         no
3   Jake   24             2             62         no         no         no
4    Amy   73             3             70        yes        yes        yes

Download Materials

Relevant Projects

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Medical Image Segmentation Deep Learning Project
In this deep learning project, you will learn to implement Unet++ models for medical image segmentation to detect and classify colorectal polyps.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Build a Face Recognition System in Python using FaceNet
In this deep learning project, you will build your own face recognition system in Python using OpenCV and FaceNet by extracting features from an image of a person's face.