How to Normalise a Pandas DataFrame Column?

How to Normalise a Pandas DataFrame Column?

How to Normalise a Pandas DataFrame Column?

This recipe helps you Normalise a Pandas DataFrame Column


Recipe Objective

In many datasets we find some of the features have very high range and some does not. So while traning a model it may be possible that the features having high range may effect the model more and make the model bias towards the feature. So for this we need to normalize the dataset i.e to change the range of values keeping the differences same.

Here we are using min-max normalizer which will normalize the data in the range 0 to 1 such that the minimum value of dataset will be 0 and the maximum will be 1.

So this recipe is a short example of How we can Normalise a Pandas DataFrame Column.

Step 1 - Import the library

import pandas as pd from sklearn import preprocessing

We have imported pandas and preprocessing from sklearn library.

Step 2 - Setup the Data

Here we have created a dictionary named data and passed that in pd.DataFrame to create a DataFrame with column named values. We have also used a print statement to print the dataframe. data = {'values': [23,243,17,30,-79,40,173,-20,69,170]} df = pd.DataFrame(data) print(df)

Step 3 - Using MinMaxScaler and transforming the Dataframe

As the dataframe is made its time to call MinMaxScaler and learn about its parameters. It has two parameters:

  • feature_range : By this parameter we can set the minimun and maximum value of normalized data that we want by passing a tuple(min , max). By default it is (0 , 1).
  • copy : It is a bool parameter which is by default True that means by default it will make a copy of new normalized data and set inplace equals to False.
We are calling MinMaxScaler with default parameters. min_max_scaler = preprocessing.MinMaxScaler()

Now, we are normalizing the dataframe (df) by using fit_transform function of MinMaxScaler and making the dataframe of the normalized array. x_scaled = min_max_scaler.fit_transform(df) df_normalized = pd.DataFrame(x_scaled)

Step 5 - Viewing the DataFrame

So we are printing the final dataframe and observe that the values have been normalized in the range 0 to 1. print(df_normalized) So the output comes as

0      23
1     243
2      17
3      30
4     -79
5      40
6     173
7     -20
8      69
9     170

0  0.316770
1  1.000000
2  0.298137
3  0.338509
4  0.000000
5  0.369565
6  0.782609
7  0.183230
8  0.459627
9  0.773292

Relevant Projects

Human Activity Recognition Using Smartphones Data Set
In this deep learning project, you will build a classification system where to precisely identify human fitness activities.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.