What is box cox transformation?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

What is box cox transformation?

What is box cox transformation?

This recipe explains what is box cox transformation

0

Recipe Objective

Transformation of any power-law or any non-linear distribution to normal distribution is generally carried on by Box-Cox Transformation. A Box cox transformation is defined as a way to transform non-normal dependent variables in our data to a normal shape.

So this recipe is a short example on what is box cox transformation. Let's get started.

Step 1 - Import the library

import numpy as np from scipy.stats import boxcox import seaborn as sns import matplotlib.pyplot as plt

Let's pause and look at these imports. Numpy is general one. boxcox will help in normalizing dataset. sns and plt are used for plotting of dataset.

Step 2 - Setup the Data

original_data = np.random.exponential(size = 1000)

We have set here an exponential function for normalization.

Now our dataset is ready.

Step 3 - Using boxcox

fitted_data, fitted_lambda = boxcox(original_data)

We have fitted our data usin boxcox into normal function and found the lamda used for the transformation.

Step 4 - Plotting the pattern

fig, ax = plt.subplots(1, 2) sns.distplot(original_data, hist = False, kde = True, kde_kws = {'shade': True, 'linewidth': 2}, label = "Non-Normal", color ="green", ax = ax[0]) sns.distplot(fitted_data, hist = False, kde = True, kde_kws = {'shade': True, 'linewidth': 2}, label = "Normal", color ="green", ax = ax[1]) plt.legend(loc = "upper right") fig.set_figheight(5) fig.set_figwidth(10)

We have simply used sns class to plot for original as well as fitted dataset.

Step 5 - Let's look at our dataset now

Once we run the above code snippet, we will see:

Srcoll down the ipython file to visualize the results.

Relevant Projects

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.