How to plot residuals of a linear regression in R

This recipe helps you plot residuals of a linear regression in R
Last Updated: 16 Aug 2022

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN R DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

How to plot residuals of a linear regression in R.

Linear Regression is a supervised learning algorithm used for continuous variables. The simple Linear Regression describes the relation between 2 variables, an independent variable (x) and a dependent variable (y). The equation for simple linear regression is**y = mx+ c** , where m is the slope and c is the intercept. The model is then trained and predictions are made over the test dataset,(y_pred) and a line between x and y_pred is fitted over. The difference between the actual values and the fitted values is known as residual values or errors / **RESIDUAL SUM OF SQUARES (RSS)**, and this must be as low as possible. Residual plots: Residual plots are plotted to analyze if the residuals in a regression problem are following normal distribution or not, and if it exhibits heteroscedasticity i.e. unequal scatter of residuals or errors. In this recipe, a dataset is considered where the relation between the cost of bags w.r.t width ,of the bags is to be determined using simple linear regression and residuals are plotted.

Access Linear Regression ML Project for Beginners with Source Code

Recipe Objective

Step 1 - Install the necessary libraries

install.packages("caTools") # For Linear regression install.packages("ggplot2") install.packages("dplyr") library(ggplot2) library(dplyr) library(caTools)

Step 2 - Read a csv file and do EDA : Exploratory Data Analysis

The dataset attached contains the data of 160 different bags associated with ABC industries. The bags have certain attributes which are described below: 1. Height – The height of the bag 2. Width – The width of the bag 3. Length – The length of the bag 4. Weight – The weight the bag can carry 5. Weight1 – Weight the bag can carry after expansion The company now wants to predict the cost they should set for a new variant of these kinds of bags.

data <- read.csv("/content/Data_1.csv") dim(data) # returns the shape of the data, i.e the total number of rows,columns print(head(data)) # head() returns the top 6 rows of the dataframe summary(data) # returns the statistical summary of the data columns

Step 3 - Train and Test data

The training data is used for building a model, while the testing data is used for making predictions. This means after fitting a model on the training data set, finding of the errors and minimizing those error, the model is used for making predictions on the unseen data which is the test data.

split <- sample.split(data, SplitRatio = 0.8) split

The split method splits the data into train and test datasets with a ratio of 0.8 This means 80% of our dataset is passed in the training dataset and 20% in the testing dataset.

train <- subset(data, split == "TRUE") test <- subset(data, split == "FALSE")

The train dataset gets all the data points after split which are 'TRUE' and similarly the test dataset gets all the data points which are 'FALSE'.

dim(train) # dimension/shape of train dataset dim(test) # dimension/shape of test dataset

Step 4 - Create a linear regression model

Here, a simple linear regression model is created with, y(dependent variable) - Cost x(independent variable) - Width

model <- lm(data, data=train)

summary gives the summary result of training model , the performance metrics r2 and rmse obtained helps us to check how well our metrics is performing

summary(model) res <- resid(model) # get list of residuals

Step 5 - Plot fitted vs residual plot

# produce a residual vs fitted plot for visulaizting heteroscedasticity #produce residual vs. fitted plot plot(fitted(model), res) #add a horizontal line at 0 abline(0,0)

Step 6 - Plot a Q-Q plot

A Q-Q plot helps determine if the residuals generated follow a normal distribution or not. The data points must fall along a rough straight line of 45 degree angles, for our data to be normally distributed.

#create Q-Q plot for residuals qqnorm(res) #add a straight diagonal line to the plot qqline(res)

Residuals tend to stray away from the plotted line, indicating they are not normally distributed.

Step 7 - Plot a density plot

A density plot helps visualize if the residuals are normally distributed or not. They must be approximately a bell-shaped curve to follow a normal distribution.

#Create density plot of residuals plot(density(res))

The density plot shows a rough bell-shaped symmetry with some values skewed to the right.

{"mode":"full","isActive":false}

What Users are saying..

Abhinav Agarwal

Graduate Student at Northwestern University

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Build an optimal End-to-End MLOps Pipeline and Deploy on GCP

Learn how to build and deploy an end-to-end optimal MLOps Pipeline for Loan Eligibility Prediction Model in Python on GCP

View Project Details

Learn Object Tracking (SOT, MOT) using OpenCV and Python

Get Started with Object Tracking using OpenCV and Python - Learn to implement Multiple Instance Learning Tracker (MIL) algorithm, Generic Object Tracking Using Regression Networks Tracker (GOTURN) algorithm, Kernelized Correlation Filters Tracker (KCF) algorithm, Tracking, Learning, Detection Tracker (TLD) algorithm for single and multiple object tracking from various video clips.

View Project Details

How to plot residuals of a linear regression in R

Recipe Objective

Table of Contents

Step 1 - Install the necessary libraries

Step 2 - Read a csv file and do EDA : Exploratory Data Analysis

Step 3 - Train and Test data

Step 4 - Create a linear regression model

Step 5 - Plot fitted vs residual plot

Step 6 - Plot a Q-Q plot

Step 7 - Plot a density plot

Abhinav Agarwal

Relevant Projects

You might also like

Relevant Projects