What is box cox transformation in R?

This recipe explains what is box cox transformation in R

Recipe Objective

What is the box cox transformation?

The boxcox transformation is used for transforming the non-normally distributed data into normally distributed data. This is important as normality is an important assumption for many statistical techniques. This transformation works by finding a lambda value λ, which helps transform the data to normality. The following formulas are used to derive the values. y (λ) = (yλ – 1) / λ if y 0 y (λ) = log (y) if y = 0 This recipe demonstrates an example of box cox transformation in R.

Step 1 - Install required package

library(MASS)

Step 2 - Generate random time series data

y <- c(1, 1, 2, 2, 2, 2, 3, 3, 5, 6) # dependent variable
x <- c(8, 7, 3, 2, 3, 4, 5, 3, 4, 7) # independent variable

Step 3 - Create a linear regression mode

#fit linear regression model model <- lm(y~x)

Step 4 - Use the boxcox()

#find optimal lambda for Box-Cox transformation
box_cox <- boxcox(y ~ x)
(lambda <- box_cox$x[which.max(box_cox$y)]) # Now, fit new linear regression model using the Box-Cox transformation new_model <- lm(((y^lambda-1)/lambda) ~ x)

Step 5 - Plot the old and new model

#define plotting area
plot_area <- par(pty = "s", mfrow = c(1, 2)) #Q-Q plot for original model
qqnorm(model$residuals)
qqline(model$residuals)
#Q-Q plot for Box-Cox transformed model
qqnorm(new_model$residuals)
qqline(new_model$residuals) #display both Q-Q plots par(plot_area)

If in the Q-Q plot the data points fall in a straight line, the data points are said to follow normality. The new model produces a Q-Q plot which has a straighter line compared to the original plot.

What Users are saying..

profile image

Jingwei Li

Graduate Research assistance at Stony Brook University
linkedin profile url

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Model Deployment on GCP using Streamlit for Resume Parsing
Perform model deployment on GCP for resume parsing model using Streamlit App.

Time Series Classification Project for Elevator Failure Prediction
In this Time Series Project, you will predict the failure of elevators using IoT sensor data as a time series classification machine learning problem.

Text Classification with Transformers-RoBERTa and XLNet Model
In this machine learning project, you will learn how to load, fine tune and evaluate various transformer models for text classification tasks.

NLP Project for Beginners on Text Processing and Classification
This Project Explains the Basic Text Preprocessing and How to Build a Classification Model in Python

Hands-On Approach to Regression Discontinuity Design Python
In this machine learning project, you will learn to implement Regression Discontinuity Design Example in Python to determine the effect of age on Mortality Rate in Python.

Learn How to Build a Linear Regression Model in PyTorch
In this Machine Learning Project, you will learn how to build a simple linear regression model in PyTorch to predict the number of days subscribed.

Multilabel Classification Project for Predicting Shipment Modes
Multilabel Classification Project to build a machine learning model that predicts the appropriate mode of transport for each shipment, using a transport dataset with 2000 unique products. The project explores and compares four different approaches to multilabel classification, including naive independent models, classifier chains, natively multilabel models, and multilabel to multiclass approaches.

Learn to Build an End-to-End Machine Learning Pipeline - Part 1
In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, addressing a major challenge in the logistics industry.

Llama2 Project for MetaData Generation using FAISS and RAGs
In this LLM Llama2 Project, you will automate metadata generation using Llama2, RAGs, and AWS to reduce manual efforts.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.