What is box cox transformation?
MACHINE LEARNING PROJECTS IN R DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

What is box cox transformation?

What is box cox transformation?

This recipe explains what is box cox transformation

Recipe Objective

Transformation of any power-law or any non-linear distribution to normal distribution is generally carried on by Box-Cox Transformation. A Box cox transformation is defined as a way to transform non-normal dependent variables in our data to a normal shape.

So this recipe is a short example on what is box cox transformation. Let's get started.

Step 1 - Import the library

import numpy as np from scipy.stats import boxcox import seaborn as sns import matplotlib.pyplot as plt

Let's pause and look at these imports. Numpy is general one. boxcox will help in normalizing dataset. sns and plt are used for plotting of dataset.

Step 2 - Setup the Data

original_data = np.random.exponential(size = 1000)

We have set here an exponential function for normalization.

Now our dataset is ready.

Step 3 - Using boxcox

fitted_data, fitted_lambda = boxcox(original_data)

We have fitted our data usin boxcox into normal function and found the lamda used for the transformation.

Step 4 - Plotting the pattern

fig, ax = plt.subplots(1, 2) sns.distplot(original_data, hist = False, kde = True, kde_kws = {'shade': True, 'linewidth': 2}, label = "Non-Normal", color ="green", ax = ax[0]) sns.distplot(fitted_data, hist = False, kde = True, kde_kws = {'shade': True, 'linewidth': 2}, label = "Normal", color ="green", ax = ax[1]) plt.legend(loc = "upper right") fig.set_figheight(5) fig.set_figwidth(10)

We have simply used sns class to plot for original as well as fitted dataset.

Step 5 - Let's look at our dataset now

Once we run the above code snippet, we will see:

Srcoll down the ipython file to visualize the results.

Relevant Projects

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

Abstractive Text Summarization using Transformers-BART Model
Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.

Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Convolutional RCCn's for extracting the text out of images
CRNNs combine both convolutional and recurrent architectures and is widely used in text detection and optical character recognition (OCR). In this project, we are going to use a CRNN architecture to detect text in sample images. The data we are going to use is TRSynth100k from Kaggle. Given an image containing some text, the goal here is to correctly identify the text using the CRNN architecture. We are going to train the model end-to-end from scratch.

Build OCR from Scratch Python using YOLO and Tesseract
In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

Creating your own embeddings using Glove and Word2vec
We all at some point in time wished to create our own language as a child! But what if certain words always cooccur with another in a corpus? Thus you can make your own model which will understand which word goes with which one, which words are often coming together etc. This all can be done by building a custom embeddings model which we create in this project

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.