How to find VIF on a data in R

This recipe helps you find VIF on a data in R

Recipe Objective

How to find VIF on a data in R.

When a Linear Regression model is built, there is a chance that some variables can be multicollinear in nature. Multicollinearity is a statistical terminology where more than one independent variable is correlated with each other. This multicollinearity results in reducing the reliability of statistical inferences. Multicollinearity in a regression model analysis occurs when two or more independent predictor variables are highly correlated to each other, which results in the lack of unique information about the regression model. Hence, these variables must be removed when building a multiple regression model. Variance inflation factor (VIF) is used for detecting the multicollinearity in a model, which measures the correlation and strength of correlation between the independent variables in a regression model. - If the value of VIF is less than 1: no correlation - If the value of VIF is between 1-5, there is moderate correlation - If the value of VIF is above 5: severe correlation This recipe demonstrates an example of how to find VIF on a data in R.

Step 1 - Install necessary packages

install.packages("caTools") # For Linear regression library(caTools) install.packages('car') library(car)

Step 2 - Define a Dataframe

data <- data.frame(marks_scored = c(35,42,24,27,37), no_hours_studied = c(5,4,2,3,4), no_hours_played = c(4,3,4,2,2), attendance = c(8,8,4,6,9)) print(data)

"Dataframe is:"
  marks_scored no_hours_studied no_hours_played attendance
1           35                5               4          8
2           42                4               3          8
3           24                2               4          4
4           27                3               2          6
5           37                4               2          9

Step 3 - Create a linear regression model

model_all <- lm(marks_scored ~ ., data=data) # with all the independent variables in the dataframe summary(model_all)

Step 4 - Use the vif() function

vif(model_all)

"Output of code is:"
no_hours_studied - 9.53333333333337
no_hours_played  - 2.56
attendance       - 11.0933333333334

Step 5 - Visualize VIF Values

vif_values <- vif(model_all) #create vector of VIF values barplot(vif_values, main = "VIF Values", horiz = TRUE, col = "steelblue") #create horizontal bar chart to display each VIF value abline(v = 5, lwd = 3, lty = 2) #add vertical line at 5 as after 5 there is severe correlation

After plotting the graph, user can does decide which variable to remove i.e not include in model building and check whether the coreesponding R squared value improves.

{"mode":"full","isActive":false}

What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

A/B Testing Approach for Comparing Performance of ML Models
The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

Insurance Pricing Forecast Using XGBoost Regressor
In this project, we are going to talk about insurance forecast by using linear and xgboost regression techniques.

Build OCR from Scratch Python using YOLO and Tesseract
In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

Time Series Classification Project for Elevator Failure Prediction
In this Time Series Project, you will predict the failure of elevators using IoT sensor data as a time series classification machine learning problem.

Azure Text Analytics for Medical Search Engine Deployment
Microsoft Azure Project - Use Azure text analytics cognitive service to deploy a machine learning model into Azure Databricks

Build Piecewise and Spline Regression Models in Python
In this Regression Project, you will learn how to build a piecewise and spline regression model from scratch in Python to predict the points scored by a sports team.

NLP and Deep Learning For Fake News Classification in Python
In this project you will use Python to implement various machine learning methods( RNN, LSTM, GRU) for fake news classification.

BERT Text Classification using DistilBERT and ALBERT Models
This Project Explains how to perform Text Classification using ALBERT and DistilBERT

Classification Projects on Machine Learning for Beginners - 2
Learn to implement various ensemble techniques to predict license status for a given business.

Learn How to Build a Logistic Regression Model in PyTorch
In this Machine Learning Project, you will learn how to build a simple logistic regression model in PyTorch for customer churn prediction.