What is box cox transformation?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

What is box cox transformation?

What is box cox transformation?

This recipe explains what is box cox transformation

0

Recipe Objective

Transformation of any power-law or any non-linear distribution to normal distribution is generally carried on by Box-Cox Transformation. A Box cox transformation is defined as a way to transform non-normal dependent variables in our data to a normal shape.

So this recipe is a short example on what is box cox transformation. Let's get started.

Step 1 - Import the library

import numpy as np from scipy.stats import boxcox import seaborn as sns import matplotlib.pyplot as plt

Let's pause and look at these imports. Numpy is general one. boxcox will help in normalizing dataset. sns and plt are used for plotting of dataset.

Step 2 - Setup the Data

original_data = np.random.exponential(size = 1000)

We have set here an exponential function for normalization.

Now our dataset is ready.

Step 3 - Using boxcox

fitted_data, fitted_lambda = boxcox(original_data)

We have fitted our data usin boxcox into normal function and found the lamda used for the transformation.

Step 4 - Plotting the pattern

fig, ax = plt.subplots(1, 2) sns.distplot(original_data, hist = False, kde = True, kde_kws = {'shade': True, 'linewidth': 2}, label = "Non-Normal", color ="green", ax = ax[0]) sns.distplot(fitted_data, hist = False, kde = True, kde_kws = {'shade': True, 'linewidth': 2}, label = "Normal", color ="green", ax = ax[1]) plt.legend(loc = "upper right") fig.set_figheight(5) fig.set_figwidth(10)

We have simply used sns class to plot for original as well as fitted dataset.

Step 5 - Let's look at our dataset now

Once we run the above code snippet, we will see:

Srcoll down the ipython file to visualize the results.

Relevant Projects

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.