How to Normalise a Pandas DataFrame Column?

How to Normalise a Pandas DataFrame Column?

How to Normalise a Pandas DataFrame Column?

This recipe helps you Normalise a Pandas DataFrame Column


Recipe Objective

In many datasets we find some of the features have very high range and some does not. So while traning a model it may be possible that the features having high range may effect the model more and make the model bias towards the feature. So for this we need to normalize the dataset i.e to change the range of values keeping the differences same.

Here we are using min-max normalizer which will normalize the data in the range 0 to 1 such that the minimum value of dataset will be 0 and the maximum will be 1.

So this recipe is a short example of How we can Normalise a Pandas DataFrame Column.

Step 1 - Import the library

import pandas as pd from sklearn import preprocessing

We have imported pandas and preprocessing from sklearn library.

Step 2 - Setup the Data

Here we have created a dictionary named data and passed that in pd.DataFrame to create a DataFrame with column named values. We have also used a print statement to print the dataframe. data = {'values': [23,243,17,30,-79,40,173,-20,69,170]} df = pd.DataFrame(data) print(df)

Step 3 - Using MinMaxScaler and transforming the Dataframe

As the dataframe is made its time to call MinMaxScaler and learn about its parameters. It has two parameters:

  • feature_range : By this parameter we can set the minimun and maximum value of normalized data that we want by passing a tuple(min , max). By default it is (0 , 1).
  • copy : It is a bool parameter which is by default True that means by default it will make a copy of new normalized data and set inplace equals to False.
We are calling MinMaxScaler with default parameters. min_max_scaler = preprocessing.MinMaxScaler()

Now, we are normalizing the dataframe (df) by using fit_transform function of MinMaxScaler and making the dataframe of the normalized array. x_scaled = min_max_scaler.fit_transform(df) df_normalized = pd.DataFrame(x_scaled)

Step 5 - Viewing the DataFrame

So we are printing the final dataframe and observe that the values have been normalized in the range 0 to 1. print(df_normalized) So the output comes as

0      23
1     243
2      17
3      30
4     -79
5      40
6     173
7     -20
8      69
9     170

0  0.316770
1  1.000000
2  0.298137
3  0.338509
4  0.000000
5  0.369565
6  0.782609
7  0.183230
8  0.459627
9  0.773292

Relevant Projects

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Build a Similar Images Finder with Python, Keras, and Tensorflow
Build your own image similarity application using Python to search and find images of products that are similar to any given product. You will implement the K-Nearest Neighbor algorithm to find products with maximum similarity.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Census Income Data Set Project - Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based on census data.