Adding a new column in python is a easy task. But have you tried to add a column with values in it based on some condition. Like a column with values which depends on the values of another column. For a small data set with few numbers of rows it may be easy to do it manually but for a large dataset with hundreds of rows it may be quite difficult to do it manually.
We can do this hectic manual work with few lines of code. We can create a function which will do it for us for all the rows.
So this recipe is a short example of how can create a new column based on a condition in Python.
import pandas as pd
import numpy as np
We have imported pandas and numpy. No other library is needed for the this function.
Here we have created a Dataframe with columns. We have used a print statement to view our initial dataset.
data = {"name": ["Jason", "Molly", "Tina", "Jake", "Amy"],
"age": [42, 52, 63, 24, 73],
"preTestScore": [4, 24, 31, 2, 3],
"postTestScore": [25, 94, 57, 62, 70]}
print(df)
df = pd.DataFrame(data, columns = ["name", "age", "preTestScore", "postTestScore"])
print(); print(df)
We are building condition for making new columns.
df["elderly@50"] = np.where(df["age"]>=50, "yes", "no")
df["elderly@60"] = np.where(df["age"]>=60, "yes", "no")
df["elderly@70"] = np.where(df["age"]>=70, "yes", "no")
print(df)
As an output we get:
name age preTestScore postTestScore 0 Jason 42 4 25 1 Molly 52 24 94 2 Tina 63 31 57 3 Jake 24 2 62 4 Amy 73 3 70 name age preTestScore postTestScore elderly@50 elderly@60 elderly@70 0 Jason 42 4 25 no no no 1 Molly 52 24 94 yes no no 2 Tina 63 31 57 yes yes no 3 Jake 24 2 62 no no no 4 Amy 73 3 70 yes yes yes