How to check multicollinearity using python?
Multicollinearity mostly occurs in a regression model when two or more independent variable are highly correlated to eachother.
The variance inflation factor (VIF) can be used to check the multicollinearity.
VIF starts at 1 and has no limits. VIF = 1, no correlation beetween idependent variables. VIF > 10 high multicollinearity between independent variables.
import pandas as pd from statsmodels.stats.outliers_influence import variance_inflation_factor
df= pd.read_csv('/content/sample_data/california_housing_train.csv') df.head()
We will define a function which will check the correlation between the independent variables.
def calc_VIF(x): vif= pd.DataFrame() vif['variables']=x.columns vif["VIF"]=[variance_inflation_factor(x.values,i) for i in range(x.shape)] return(vif)