Data Cleaning: How to Change Column Name in CSV File Using Python?

This recipe explains how to change column names, update values, and modify CSV files in Python with ease. | ProjectPro
Last Updated: 06 Feb 2024

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN PYTHON DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective - How to Update a Column in CSV File Using Python?

Effective data cleaning in data processing pipelines plays a pivotal role in standardizing raw data structures. This ensures that the information is ready for storage in data lakes, consequently influencing informed business decisions. Without data cleaning being done, it's hard to standardize the structure of the data and store it into data lakes which will have a drastic impact on the business decisions on the data. This recipe addresses key aspects of data cleaning using Python, focusing on importing modules, reading CSV files, changing date formats, converting column names to lowercase, replacing spaces with underscores, and finally, renaming columns.

Master the Art of Data Cleaning in Machine Learning

System requirements:-

Install the python module as follows if the below modules are not found:
pip install requests
The below codes can be run in Jupyter notebook, or any python console

Stepwise Implementation on How to Update Data in CSV File Using Python

Here is a step-by-step approach to clean and manipulate CSV files, ensuring data is ready for advanced analytics and machine learning applications.

Step 1: Import the module

import pandas as pd

import datetime

Step 2 : Read the CSV File

Read the csv file from the local and create a dataframe using pandas, and print the top 5 rows of the dataframe.

df = pd.read_csv('stockdata.csv')

df.head()

Output of the above code:

Read the CSV File

Step 3 : Change the Date Format

The code below helps you change the date column format "28-February-2015" to 28-02-2015.

df['Date'] = pd.to_datetime(df['Date'])

print(df['Date'])

Output of the above code:

Change the Date Format

Step 4 : Convert Column Names to Lowercase

# To convert the column names lowercase

df.columns.str.lower()

Output of the above code:

Convert the Column Names to Lowercase

Step 5 : Replacing Empty Spaces with Underscore

In the code below we are going to replace the empty spaces in the column names with underscore.

# to put underscore in all columns

### Replacing Empty spaces with underscore

df.columns.str.replace(' ','_')

Output of the above code:

Replace Values in CSV File Python

Step 6 : Rename the Column Names

In the below we are reading column names from the dataframe and rename the column names to create a new dataframe. Print the columns to check :

df.columns

df.head()

Output of the above code:

Replace values in CSV File in Python

Output of the above code:

Show the DataFrame to first five rows

Create a new dataframe, rename column names using the rename method in key value format and print the 5 lines.

df2 = df.rename(columns = {'Date' :'stock_date', 'Open Price':'open_price', 'High Price':'high_price', 'Low Price':'low_price', 'Close Price':'close_price', 'WAP':'weighted_avg_price',

'No.of Shares':'num_of_shares', 'No. of Trades':'num_of_rades', 'Total Turnover (Rs.)':'tot_turnover_in_rupees','Deliverable Quantity':'delvry_quantity', '% Deli. Qty to Traded Qty':'delvry_qty_to_traded_qty', 'Spread High-Low':'spread_hi-low',

'Spread Close-Open':'spread_close-open'}, inplace = False)

df2.head()

Output of the above code:

Step 7 : Check for Missing Values

To make detecting missing values, Pandas library provides the isnull() and notnull() functions, which are also methods on Series and DataFrame objects.True if the original value is null, False if the original value isn't null

df2.isnull()

Output of the above code:

How to Check Missing Values in a DataFrame?

Another way it can easily be done is by using the isnull function paired with the 'sum' function. it will give the count of null values from each column.

df2.isnull().sum()

Output of the above code:

Pandas check missing values

Step 8 : Filling Missing Data

Pandas library provides different types of methods for cleaning the missing values. The fillna function can "fill in" NA values with non-null data in a couple of ways.

df2.close_price = df2.close_price.fillna(0)

print(df2.close_price.isnull().sum())

Output of the above code: In the above code filling the null values with 0

Filling Missing Values

Step 9 : Dropping Missing Data

The final thing you can do is to delete the missing rows of data. We can delete all rows with any missing data as shown below.

df2= df2.dropna()

df2.isnull().sum()

Other way: Use the drop to Remove rows or columns by specifying label names and corresponding axis data as shown below.

df2= df2.drop()

df2.isnull().sum()

Step 10 : Create CSV File

After renaming the column names, write the formatted data into a csv file in your local or HDFS.

# write formatted to csv

df2.to_csv("stock_data.csv")

Output of the above lines:

Displaying the updated CSV File in Python

Practice more DataFrame Operations through ProjectPro’s Real-World Projects!

With the step-by-step guide provided, you now have an understanding to enhance and customize your data preprocessing tasks. To further solidify your expertise, consider applying these techniques to real-world projects offered by ProjectPro. Engaging in practical, hands-on experiences will not only reinforce your understanding but also expose you to a variety of DataFrame operations, allowing you to tackle diverse challenges in data manipulation. Subscribe to ProjectPro Repository to get access to 270+ solved enterprise-grade projects focused in data science and machine learning.

What Users are saying..

Gautam Vermani

Data Consultant at Confidential

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Multilabel Classification Project for Predicting Shipment Modes

Multilabel Classification Project to build a machine learning model that predicts the appropriate mode of transport for each shipment, using a transport dataset with 2000 unique products. The project explores and compares four different approaches to multilabel classification, including naive independent models, classifier chains, natively multilabel models, and multilabel to multiclass approaches.

View Project Details

Data Cleaning: How to Change Column Name in CSV File Using Python?

Recipe Objective - How to Update a Column in CSV File Using Python?

System requirements:-

Stepwise Implementation on How to Update Data in CSV File Using Python

Step 1: Import the module

Step 2 : Read the CSV File

Step 3 : Change the Date Format

Step 4 : Convert Column Names to Lowercase

Step 5 : Replacing Empty Spaces with Underscore

Step 6 : Rename the Column Names

Step 7 : Check for Missing Values

Step 8 : Filling Missing Data

Step 9 : Dropping Missing Data

Step 10 : Create CSV File

Practice more DataFrame Operations through ProjectPro’s Real-World Projects!

Gautam Vermani

Relevant Projects

You might also like

Relevant Projects