How to find VIF on a data in R

This recipe helps you find VIF on a data in R
Last Updated: 16 Dec 2022

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN R DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

How to find VIF on a data in R.

When a Linear Regression model is built, there is a chance that some variables can be multicollinear in nature. Multicollinearity is a statistical terminology where more than one independent variable is correlated with each other. This multicollinearity results in reducing the reliability of statistical inferences. Multicollinearity in a regression model analysis occurs when two or more independent predictor variables are highly correlated to each other, which results in the lack of unique information about the regression model. Hence, these variables must be removed when building a multiple regression model. Variance inflation factor (VIF) is used for detecting the multicollinearity in a model, which measures the correlation and strength of correlation between the independent variables in a regression model. - If the value of VIF is less than 1: no correlation - If the value of VIF is between 1-5, there is moderate correlation - If the value of VIF is above 5: severe correlation This recipe demonstrates an example of how to find VIF on a data in R.

Recipe Objective

Step 1 - Install necessary packages

install.packages("caTools") # For Linear regression library(caTools) install.packages('car') library(car)

Step 2 - Define a Dataframe

data <- data.frame(marks_scored = c(35,42,24,27,37), no_hours_studied = c(5,4,2,3,4), no_hours_played = c(4,3,4,2,2), attendance = c(8,8,4,6,9)) print(data)

"Dataframe is:"
  marks_scored no_hours_studied no_hours_played attendance
1           35                5               4          8
2           42                4               3          8
3           24                2               4          4
4           27                3               2          6
5           37                4               2          9

Step 3 - Create a linear regression model

model_all <- lm(marks_scored ~ ., data=data) # with all the independent variables in the dataframe summary(model_all)

Step 4 - Use the vif() function

vif(model_all)

"Output of code is:"
no_hours_studied - 9.53333333333337
no_hours_played  - 2.56
attendance       - 11.0933333333334

Step 5 - Visualize VIF Values

vif_values <- vif(model_all) #create vector of VIF values barplot(vif_values, main = "VIF Values", horiz = TRUE, col = "steelblue") #create horizontal bar chart to display each VIF value abline(v = 5, lwd = 3, lty = 2) #add vertical line at 5 as after 5 there is severe correlation

After plotting the graph, user can does decide which variable to remove i.e not include in model building and check whether the coreesponding R squared value improves.

{"mode":"full","isActive":false}

What Users are saying..

Gautam Vermani

Data Consultant at Confidential

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

A/B Testing Approach for Comparing Performance of ML Models

The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

View Project Details

How to find VIF on a data in R

Recipe Objective

Table of Contents

Step 1 - Install necessary packages

Step 2 - Define a Dataframe

Step 3 - Create a linear regression model

Step 4 - Use the vif() function

Step 5 - Visualize VIF Values

Gautam Vermani

Relevant Projects

You might also like

Relevant Projects