How to Normalise a Pandas DataFrame Column?

This recipe helps you Normalise a Pandas DataFrame Column

Recipe Objective

In many datasets we find some of the features have very high range and some does not. So while traning a model it may be possible that the features having high range may effect the model more and make the model bias towards the feature. So for this we need to normalize the dataset i.e to change the range of values keeping the differences same.

Here we are using min-max normalizer which will normalize the data in the range 0 to 1 such that the minimum value of dataset will be 0 and the maximum will be 1.

So this recipe is a short example of How we can Normalise a Pandas DataFrame Column.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Step 1 - Import the library

import pandas as pd from sklearn import preprocessing

We have imported pandas and preprocessing from sklearn library.

Step 2 - Setup the Data

Here we have created a dictionary named data and passed that in pd.DataFrame to create a DataFrame with column named values. We have also used a print statement to print the dataframe. data = {'values': [23,243,17,30,-79,40,173,-20,69,170]} df = pd.DataFrame(data) print(df)

Step 3 - Using MinMaxScaler and transforming the Dataframe

As the dataframe is made its time to call MinMaxScaler and learn about its parameters. It has two parameters:

  • feature_range : By this parameter we can set the minimun and maximum value of normalized data that we want by passing a tuple(min , max). By default it is (0 , 1).
  • copy : It is a bool parameter which is by default True that means by default it will make a copy of new normalized data and set inplace equals to False.

We are calling MinMaxScaler with default parameters. min_max_scaler = preprocessing.MinMaxScaler()

Now, we are normalizing the dataframe (df) by using fit_transform function of MinMaxScaler and making the dataframe of the normalized array. x_scaled = min_max_scaler.fit_transform(df) df_normalized = pd.DataFrame(x_scaled)

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 5 - Viewing the DataFrame

So we are printing the final dataframe and observe that the values have been normalized in the range 0 to 1. print(df_normalized) So the output comes as

   values
0      23
1     243
2      17
3      30
4     -79
5      40
6     173
7     -20
8      69
9     170

          0
0  0.316770
1  1.000000
2  0.298137
3  0.338509
4  0.000000
5  0.369565
6  0.782609
7  0.183230
8  0.459627
9  0.773292

Download Materials

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Image Classification Model using Transfer Learning in PyTorch
In this PyTorch Project, you will build an image classification model in PyTorch using the ResNet pre-trained model.

Medical Image Segmentation Deep Learning Project
In this deep learning project, you will learn to implement Unet++ models for medical image segmentation to detect and classify colorectal polyps.

BigMart Sales Prediction ML Project in Python
The goal of the BigMart Sales Prediction ML project is to build and evaluate different predictive models and determine the sales of each product at a store.

Stock Price Prediction Project using LSTM and RNN
Learn how to predict stock prices using RNN and LSTM models. Understand deep learning concepts and apply them to real-world financial data for accurate forecasting.

MLOps Project on GCP using Kubeflow for Model Deployment
MLOps using Kubeflow on GCP - Build and deploy a deep learning model on Google Cloud Platform using Kubeflow pipelines in Python

Build CNN Image Classification Models for Real Time Prediction
Image Classification Project to build a CNN model in Python that can classify images into social security cards, driving licenses, and other key identity information.

End-to-End Snowflake Healthcare Analytics Project on AWS-1
In this Snowflake Healthcare Analytics Project, you will leverage Snowflake on AWS to predict patient length of stay (LOS) in hospitals. The prediction of LOS can help in efficient resource allocation, lower the risk of staff/visitor infections, and improve overall hospital functioning.

End-to-End Speech Emotion Recognition Project using ANN
Speech Emotion Recognition using RAVDESS Audio Dataset - Build an Artificial Neural Network Model to Classify Audio Data into various Emotions like Sad, Happy, Angry, and Neutral

Learn How to Build a Linear Regression Model in PyTorch
In this Machine Learning Project, you will learn how to build a simple linear regression model in PyTorch to predict the number of days subscribed.

Learn to Build a Neural network from Scratch using NumPy
In this deep learning project, you will learn to build a neural network from scratch using NumPy