Data Cleaning: How to Change Column Name in CSV File Using Python?

This recipe explains how to change column names, update values, and modify CSV files in Python with ease. | ProjectPro

Recipe Objective - How to Update a Column in CSV File Using Python? 

Effective data cleaning in data processing pipelines plays a pivotal role in standardizing raw data structures. This ensures that the information is ready for storage in data lakes, consequently influencing informed business decisions. Without data cleaning being done, it's hard to standardize the structure of the data and store it into data lakes which will have a drastic impact on the business decisions on the data. This recipe addresses key aspects of data cleaning using Python, focusing on importing modules, reading CSV files, changing date formats, converting column names to lowercase, replacing spaces with underscores, and finally, renaming columns.

Master the Art of Data Cleaning in Machine Learning

System requirements:- 

  • Install the python module as follows if the below modules are not found:

  • pip install requests

  • The below codes can be run in Jupyter notebook, or any python console

Stepwise Implementation on How to Update Data in CSV File Using Python 

Here is a step-by-step approach to clean and manipulate CSV files, ensuring data is ready for advanced analytics and machine learning applications. 

Step 1: Import the module

import pandas as pd

import datetime 

Step 2 : Read the CSV File

Read the csv file from the local and create a dataframe using pandas, and print the top 5 rows of the dataframe. 

df = pd.read_csv('stockdata.csv')

df.head() 

Output of the above code:

Read the CSV File

Step 3 : Change the Date Format

The code below helps you change the date column format "28-February-2015" to 28-02-2015.

df['Date'] = pd.to_datetime(df['Date'])

print(df['Date']) 

Output of the above code:

Change the Date Format

Step 4 : Convert Column Names to Lowercase

# To convert the  column names lowercase

df.columns.str.lower()

Output of the above code:

Convert the Column Names to Lowercase

Step 5 : Replacing Empty Spaces with Underscore

In the code below we are going to replace the empty spaces in the column names with underscore. 

# to put underscore in all columns

### Replacing Empty spaces with underscore

df.columns.str.replace(' ','_')

Output of the above code:

Replace Values in CSV File Python

Step 6 : Rename the Column Names

In the below we are reading column names from the dataframe and rename the column names to create a new dataframe. Print the columns to check :

df.columns

df.head()

Output of the above code:

Replace values in CSV File in Python

Output of the above code:

Show the DataFrame to first five rows

Create a new dataframe, rename column names using the rename method in key value format and print the 5 lines.

df2 = df.rename(columns = {'Date' :'stock_date', 'Open Price':'open_price', 'High Price':'high_price', 'Low Price':'low_price', 'Close Price':'close_price', 'WAP':'weighted_avg_price',

       'No.of Shares':'num_of_shares', 'No. of Trades':'num_of_rades', 'Total Turnover (Rs.)':'tot_turnover_in_rupees','Deliverable Quantity':'delvry_quantity', '% Deli. Qty to Traded Qty':'delvry_qty_to_traded_qty', 'Spread High-Low':'spread_hi-low',

       'Spread Close-Open':'spread_close-open'}, inplace = False)

df2.head()

Output of the above code:

Step 7 : Check for Missing Values

To make detecting missing values, Pandas library provides the isnull() and notnull() functions, which are also methods on Series and DataFrame objects.True if the original value is null, False if the original value isn't null

df2.isnull()

Output of the above code:

How to Check Missing Values in a DataFrame?

Another way it can easily be done is by using the isnull function paired with the 'sum' function. it will give the count of null values from each column.

df2.isnull().sum()

Output of the above code:

Pandas check missing values

Step 8 : Filling Missing Data

Pandas library provides different types of methods for cleaning the missing values. The fillna function can "fill in" NA values with non-null data in a couple of ways.

df2.close_price = df2.close_price.fillna(0)

print(df2.close_price.isnull().sum())

Output of the above code: In the above code filling the null values with 0

Filling Missing Values

Step 9 : Dropping Missing Data

The final thing you can do is to delete the missing rows of data. We can delete all rows with any missing data as shown below.

df2= df2.dropna() 

df2.isnull().sum()

Other way: Use the drop to Remove rows or columns by specifying label names and corresponding axis data as shown below.

df2= df2.drop() 

df2.isnull().sum()

Step 10 : Create CSV File

After renaming the column names, write the formatted data into a csv file in your local or HDFS. 

# write formatted to csv

df2.to_csv("stock_data.csv")

Output of the above lines:

Displaying the updated CSV File in Python

Practice more DataFrame Operations through ProjectPro’s Real-World Projects! 

With the step-by-step guide provided, you now have an understanding to enhance and customize your data preprocessing tasks. To further solidify your expertise, consider applying these techniques to real-world projects offered by ProjectPro. Engaging in practical, hands-on experiences will not only reinforce your understanding but also expose you to a variety of DataFrame operations, allowing you to tackle diverse challenges in data manipulation. Subscribe to ProjectPro Repository to get access to 270+ solved enterprise-grade projects focused in data science and machine learning. 

What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Multilabel Classification Project for Predicting Shipment Modes
Multilabel Classification Project to build a machine learning model that predicts the appropriate mode of transport for each shipment, using a transport dataset with 2000 unique products. The project explores and compares four different approaches to multilabel classification, including naive independent models, classifier chains, natively multilabel models, and multilabel to multiclass approaches.

Text Classification with Transformers-RoBERTa and XLNet Model
In this machine learning project, you will learn how to load, fine tune and evaluate various transformer models for text classification tasks.

Build Classification Algorithms for Digital Transformation[Banking]
Implement a machine learning approach using various classification techniques in Python to examine the digitalisation process of bank customers.

Build Piecewise and Spline Regression Models in Python
In this Regression Project, you will learn how to build a piecewise and spline regression model from scratch in Python to predict the points scored by a sports team.

NLP Project to Build a Resume Parser in Python using Spacy
Use the popular Spacy NLP python library for OCR and text classification to build a Resume Parser in Python.

Time Series Analysis with Facebook Prophet Python and Cesium
Time Series Analysis Project - Use the Facebook Prophet and Cesium Open Source Library for Time Series Forecasting in Python

Build a Customer Churn Prediction Model using Decision Trees
Develop a customer churn prediction model using decision tree machine learning algorithms and data science on streaming service data.

Build a Graph Based Recommendation System in Python-Part 2
In this Graph Based Recommender System Project, you will build a recommender system project for eCommerce platforms and learn to use FAISS for efficient similarity search.

Build Regression (Linear,Ridge,Lasso) Models in NumPy Python
In this machine learning regression project, you will learn to build NumPy Regression Models (Linear Regression, Ridge Regression, Lasso Regression) from Scratch.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.