How to create Goodness of Fit Plots in the StatsModels library?

This recipe describes how to create Goodness of Fit Plots in the StatsModels library

Recipe Objective - How to create Goodness of Fit Plots in the StatsModels library?

There are four plots under the Goodness of Fit category in the StatsModel library:

For more related projects -

https://www.dezyre.com/projects/data-science-projects/deep-learning-projects
https://www.dezyre.com/projects/data-science-projects/neural-network-projects

Different goodness of fit plots:

qqplot(data[, dist, distargs, a, loc, ...])
Q-Q plot of the quantiles of x versus the quantiles/ppf of a distribution.

qqline(ax, line[, x, y, dist, fmt])
Plot a reference line for a qqplot.

qqplot_2samples(data1, data2[, xlabel, ...])
Q-Q Plot of two samples' quantiles.

ProbPlot(data[, dist, fit, distargs, a, ...])
Q-Q and P-P Probability Plots

qqplot:

# Importing libraries
import statsmodels.api as sm
from matplotlib import pyplot as plt

# Loading longley data present in statsmodel dataset
lonegley_data = sm.datasets.longley.load()

# Declaring dependent variables
dep_vars = sm.add_constant(lonegley_data.exog)

# Declaring independent variable
indep_var = lonegley_data.endog

# Fitting OLS model
model = sm.OLS(indep_var, dep_vars)
model = model.fit()

# Finding residuals of model
model_residuals = model.resid # residuals

# Plotting the qqplot of model residuals
plot = sm.qqplot(model_residuals)

qqline:

# Importing qqline
from statsmodels.graphics.gofplots import qqline
ax = plt.subplot(111)

# Importing moore dataset from carData package
X = sm.datasets.get_rdataset("Moore", "carData").data

# Scatter plot
plt.scatter(X['conformity'],X['fscore'])
ax.set_xlabel('conformity')
ax.set_ylabel('fscore')

# qqline plot
qqline(ax,"r",X['conformity'],X['fscore'])

In this way, we can create Goodness of Fit Plots in the StatsModels library.

What Users are saying..

profile image

Savvy Sahai

Data Science Intern, Capgemini
linkedin profile url

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More

Relevant Projects

Digit Recognition using CNN for MNIST Dataset in Python
In this deep learning project, you will build a convolutional neural network using MNIST dataset for handwritten digit recognition.

Learn How to Build PyTorch Neural Networks from Scratch
In this deep learning project, you will learn how to build PyTorch neural networks from scratch.

Build Classification Algorithms for Digital Transformation[Banking]
Implement a machine learning approach using various classification techniques in Python to examine the digitalisation process of bank customers.

Deploy Transformer-BART Model on Paperspace Cloud
In this MLOps Project you will learn how to deploy a Tranaformer BART Model for Abstractive Text Summarization on Paperspace Private Cloud

Insurance Pricing Forecast Using XGBoost Regressor
In this project, we are going to talk about insurance forecast by using linear and xgboost regression techniques.

Linear Regression Model Project in Python for Beginners Part 2
Machine Learning Linear Regression Project for Beginners in Python to Build a Multiple Linear Regression Model on Soccer Player Dataset.

Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

Build a Customer Churn Prediction Model using Decision Trees
Develop a customer churn prediction model using decision tree machine learning algorithms and data science on streaming service data.

Build a Multi Class Image Classification Model Python using CNN
This project explains How to build a Sequential Model that can perform Multi Class Image Classification in Python using CNN

Forecasting Business KPI's with Tensorflow and Python
In this machine learning project, you will use the video clip of an IPL match played between CSK and RCB to forecast key performance indicators like the number of appearances of a brand logo, the frames, and the shortest and longest area percentage in the video.