How to insert a new column based on condition in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to insert a new column based on condition in Python?

How to insert a new column based on condition in Python?

This recipe helps you insert a new column based on condition in Python

0

Recipe Objective

Adding a new column in python is a easy task. But have you tried to add a column with values in it based on some condition. Like a column with values which depends on the values of another column. For a small data set with few numbers of rows it may be easy to do it manually but for a large dataset with hundreds of rows it may be quite difficult to do it manually.

We can do this hectic manual work with few lines of code. We can create a function which will do it for us for all the rows.

So this recipe is a short example of how to create a function which will insert a new column with values in it based on some condition.

Step 1 - Import the library

import pandas as pd import numpy as np

We have imported pandas and numpy. No other library is needed for the this function.

Step 2 - Creating a sample Dataset

Here we have created a Dataframe with columns 'bond_name' and 'risk_score'. We have used a print statement to view our initial dataset. raw_data = {'bond_name': ['govt_bond_1', 'govt_bond_2', 'govt_bond_3', 'pvt_bond_1', 'pvt_bond_2', 'pvt_bond_3', 'pvt_bond_4'], 'risk_score': [1.6, 0.9, 2.3, 3.0, 2.7, 1.8, 4.1]} df = pd.DataFrame(raw_data, columns = ['bond_name', 'risk_score']) print(df)

Step 3 - Creating a function to assign values in column

First, we have created empty list named as rating which we will append and assign values as per the conditition. rating = []

Now we have created a loop which will iterate over all the rows in column 'risk_score' and assign values in the list.
We are using if-else function to make the conditon on which we want to assign the values in the column. Here, we want to assign rating on the based of risk_score. The condition which we are making is:

  • If the value in risk_score is between 0 and 1 then it will assign 'AA' in rating column
  • If the value in risk_score is between 1 and 2 then it will assign 'A' in rating column
  • If the value in risk_score is between 2 and 3 then it will assign 'BB' in rating column
  • If the value in risk_score is between 3 and 4 then it will assign 'B' in rating column
  • If the value in risk_score is between 4 and 5 then it will assign 'C' in rating column
  • If there is no value in risk_score then it will assign Not_Rated in the rating columnn
rating = [] for row in df['risk_score']: if row < 1.0 : rating.append('AA') elif row < 2.0: rating.append('A') elif row < 3.0: rating.append('BB') elif row < 4.0: rating.append('B') elif row < 5.0: rating.append('C') else: rating.append('Not_Rated')

Step 5 - Converting list into column of dataset and viewing the final dataset

So finally we are adding that list as a column in the dataset and printing the final dataset to see the changes. df['rating'] = rating print(df) As an output we get:

     bond_name  risk_score
0  govt_bond_1         1.6
1  govt_bond_2         0.9
2  govt_bond_3         2.3
3   pvt_bond_1         3.0
4   pvt_bond_2         2.7
5   pvt_bond_3         1.8
6   pvt_bond_4         4.1

     bond_name  risk_score rating
0  govt_bond_1         1.6      A
1  govt_bond_2         0.9     AA
2  govt_bond_3         2.3     BB
3   pvt_bond_1         3.0      B
4   pvt_bond_2         2.7     BB
5   pvt_bond_3         1.8      A
6   pvt_bond_4         4.1      C

Here we can see that a new column has been added with the values according to the risk_score

Relevant Projects

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.