How to insert a new column based on condition in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to insert a new column based on condition in Python?

How to insert a new column based on condition in Python?

This recipe helps you insert a new column based on condition in Python

0

Recipe Objective

Adding a new column in python is a easy task. But have you tried to add a column with values in it based on some condition. Like a column with values which depends on the values of another column. For a small data set with few numbers of rows it may be easy to do it manually but for a large dataset with hundreds of rows it may be quite difficult to do it manually.

We can do this hectic manual work with few lines of code. We can create a function which will do it for us for all the rows.

So this recipe is a short example of how to create a function which will insert a new column with values in it based on some condition.

Step 1 - Import the library

import pandas as pd import numpy as np

We have imported pandas and numpy. No other library is needed for the this function.

Step 2 - Creating a sample Dataset

Here we have created a Dataframe with columns 'bond_name' and 'risk_score'. We have used a print statement to view our initial dataset. raw_data = {'bond_name': ['govt_bond_1', 'govt_bond_2', 'govt_bond_3', 'pvt_bond_1', 'pvt_bond_2', 'pvt_bond_3', 'pvt_bond_4'], 'risk_score': [1.6, 0.9, 2.3, 3.0, 2.7, 1.8, 4.1]} df = pd.DataFrame(raw_data, columns = ['bond_name', 'risk_score']) print(df)

Step 3 - Creating a function to assign values in column

First, we have created empty list named as rating which we will append and assign values as per the conditition. rating = []

Now we have created a loop which will iterate over all the rows in column 'risk_score' and assign values in the list.
We are using if-else function to make the conditon on which we want to assign the values in the column. Here, we want to assign rating on the based of risk_score. The condition which we are making is:

  • If the value in risk_score is between 0 and 1 then it will assign 'AA' in rating column
  • If the value in risk_score is between 1 and 2 then it will assign 'A' in rating column
  • If the value in risk_score is between 2 and 3 then it will assign 'BB' in rating column
  • If the value in risk_score is between 3 and 4 then it will assign 'B' in rating column
  • If the value in risk_score is between 4 and 5 then it will assign 'C' in rating column
  • If there is no value in risk_score then it will assign Not_Rated in the rating columnn
rating = [] for row in df['risk_score']: if row < 1.0 : rating.append('AA') elif row < 2.0: rating.append('A') elif row < 3.0: rating.append('BB') elif row < 4.0: rating.append('B') elif row < 5.0: rating.append('C') else: rating.append('Not_Rated')

Step 5 - Converting list into column of dataset and viewing the final dataset

So finally we are adding that list as a column in the dataset and printing the final dataset to see the changes. df['rating'] = rating print(df) As an output we get:

     bond_name  risk_score
0  govt_bond_1         1.6
1  govt_bond_2         0.9
2  govt_bond_3         2.3
3   pvt_bond_1         3.0
4   pvt_bond_2         2.7
5   pvt_bond_3         1.8
6   pvt_bond_4         4.1

     bond_name  risk_score rating
0  govt_bond_1         1.6      A
1  govt_bond_2         0.9     AA
2  govt_bond_3         2.3     BB
3   pvt_bond_1         3.0      B
4   pvt_bond_2         2.7     BB
5   pvt_bond_3         1.8      A
6   pvt_bond_4         4.1      C

Here we can see that a new column has been added with the values according to the risk_score

Relevant Projects

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.