How to Normalise a Pandas DataFrame Column?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to Normalise a Pandas DataFrame Column?

How to Normalise a Pandas DataFrame Column?

This recipe helps you Normalise a Pandas DataFrame Column

0

Recipe Objective

In many datasets we find some of the features have very high range and some does not. So while traning a model it may be possible that the features having high range may effect the model more and make the model bias towards the feature. So for this we need to normalize the dataset i.e to change the range of values keeping the differences same.

Here we are using min-max normalizer which will normalize the data in the range 0 to 1 such that the minimum value of dataset will be 0 and the maximum will be 1.

So this recipe is a short example of How we can Normalise a Pandas DataFrame Column.

Step 1 - Import the library

import pandas as pd from sklearn import preprocessing

We have imported pandas and preprocessing from sklearn library.

Step 2 - Setup the Data

Here we have created a dictionary named data and passed that in pd.DataFrame to create a DataFrame with column named values. We have also used a print statement to print the dataframe. data = {'values': [23,243,17,30,-79,40,173,-20,69,170]} df = pd.DataFrame(data) print(df)

Step 3 - Using MinMaxScaler and transforming the Dataframe

As the dataframe is made its time to call MinMaxScaler and learn about its parameters. It has two parameters:

  • feature_range : By this parameter we can set the minimun and maximum value of normalized data that we want by passing a tuple(min , max). By default it is (0 , 1).
  • copy : It is a bool parameter which is by default True that means by default it will make a copy of new normalized data and set inplace equals to False.
We are calling MinMaxScaler with default parameters. min_max_scaler = preprocessing.MinMaxScaler()

Now, we are normalizing the dataframe (df) by using fit_transform function of MinMaxScaler and making the dataframe of the normalized array. x_scaled = min_max_scaler.fit_transform(df) df_normalized = pd.DataFrame(x_scaled)

Step 5 - Viewing the DataFrame

So we are printing the final dataframe and observe that the values have been normalized in the range 0 to 1. print(df_normalized) So the output comes as

   values
0      23
1     243
2      17
3      30
4     -79
5      40
6     173
7     -20
8      69
9     170

          0
0  0.316770
1  1.000000
2  0.298137
3  0.338509
4  0.000000
5  0.369565
6  0.782609
7  0.183230
8  0.459627
9  0.773292

Relevant Projects

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.