How to check multicollinearity using R

This recipe helps you check multicollinearity using R

Recipe Objective

How to check multicollinearity using R?

Linear Regression is a supervised learning algorithm used for continuous variables. When a Linear Regression model is built, there is a chance that some variables can be multicollinear in nature. Multicollinearity is a statistical terminology where more than one independent variable is correlated with each other. This multicollinearity results in reducing the reliability of statistical inferences. Multicollinearity in a regression model analysis occurs when two or more independent predictor variables are highly correlated to each other, which results in the lack of unique information about the regression model. Hence, these variables must be removed when building a multiple regression model. Variance inflation factor (VIF) is used for detecting the multicollinearity in a model, which measures the correlation and strength of correlation between the independent variables in a regression model. - If the value of VIF is less than 1: no correlation - If the value of VIF is between 1-5, there is moderate correlation - If the value of VIF is above 5: severe correlation This recipe explains how to check multicollinearity in regression using R.

Step 1 - Install necessary packages

install.packages("caTools") # For Linear regression install.packages('car') # To check multicollinearity install.packages("quantmod") install.packages("MASS") install.packages("corrplot") # plot correlation plot library(caTools) library(car) library(quantmod) library(MASS) library(corrplot)

Step 2 - Define a Dataframe

data <- data.frame(marks_scored = c(35,42,24,27,37), # marks : dependent variable(y) no_hours_studied = c(5,4,2,3,4), no_hours_played = c(4,3,4,2,2), attendance = c(8,8,4,6,9)) head(data)

Step 3 - Create a linear regression model

model_all <- lm(marks_scored ~ ., data=data) # with all the independent variables in the dataframe summary(model_all)

Step 4 - Use the vif() function

vif(model_all)

Step 5 - Visualize VIF Values

vif_values <- vif(model_all) #create vector of VIF values barplot(vif_values, main = "VIF Values", horiz = TRUE, col = "steelblue") #create horizontal bar chart to display each VIF value abline(v = 5, lwd = 3, lty = 2) #add vertical line at 5 as after 5 there is severe correlation

After plotting the graph, user can does decide which variable to remove i.e not include in model building and check whether the coreesponding R squared value improves.

Step 6 - Multicollinearity test can be checked by

data_x <- data[,2:4] # independent variables var <- cor(data_x) # independent variables correlation matrix var_inv <- ginv(var) # independent variables inverse correlation matrix colnames(var_inv) <- colnames(data_x) # rename the row names and column names rownames(var_inv) <- colnames(data_x) corrplot(var_inv,method='number',is.corr = F) # visualize the multicollinearity {"mode":"full","isActive":false}

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Build ARCH and GARCH Models in Time Series using Python
In this Project we will build an ARCH and a GARCH model using Python

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

OpenCV Project for Beginners to Learn Computer Vision Basics
In this OpenCV project, you will learn computer vision basics and the fundamentals of OpenCV library using Python.

Azure Text Analytics for Medical Search Engine Deployment
Microsoft Azure Project - Use Azure text analytics cognitive service to deploy a machine learning model into Azure Databricks

Loan Eligibility Prediction Project using Machine learning on GCP
Loan Eligibility Prediction Project - Use SQL and Python to build a predictive model on GCP to determine whether an application requesting loan is eligible or not.

Locality Sensitive Hashing Python Code for Look-Alike Modelling
In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.

Build CNN for Image Colorization using Deep Transfer Learning
Image Processing Project -Train a model for colorization to make grayscale images colorful using convolutional autoencoders.

Build Real Estate Price Prediction Model with NLP and FastAPI
In this Real Estate Price Prediction Project, you will learn to build a real estate price prediction machine learning model and deploy it on Heroku using FastAPI Framework.

Learn to Build a Polynomial Regression Model from Scratch
In this Machine Learning Regression project, you will learn to build a polynomial regression model to predict points scored by the sports team.