What is weighted least squares regression How to perform it in R

This recipe explains what is weighted least squares regression This recipe helps you perform it in R

Recipe Objective

What is weighted least squares regression? How to perform it in R?

Linear Regression is a supervised learning algorithm used for continuous variables. The simple Linear Regression describes the relation between 2 variables, an independent variable (x) and a dependent variable (y). The equation for simple linear regression is**y = mx+ c** , where m is the slope and c is the intercept. The model is then trained and predictions are made over the test dataset,(y_pred) and a line between x and y_pred is fitted over. The accuracy of this model is checked using the **performance metrics** R squared and RMSE -root mean squared error. Weighted Least Square Regression:. The simple linear regression model assumes that the residuals that occurred are distributed with equal variance at all levels of predictor variables, meaning they follow homoscedasticity, but when this doesn't happen, then it is said to follow heteroscedasticity. To handle this, weighted least square regression is used instead, which assigns weights on the observations. The smaller errors variances are given more weight compared to larger errors, as they contain more information. In this recipe, a dataset where the relation between the cost of bags w.r.t width ,of the bags is to be determined using simple linear regression and weighted least square regression.

Learn How to do Exploratory Data Analysis

Step 1 - Install the necessary libraries

install.packages("ggplot2") install.packages("dplyr") install.packages("caTools") # For Linear regression install.packages('lmtest') # load lmtest package library(caTools) library(ggplot2) library(dplyr) library(lmtest)

Step 2 - Read a csv file and do EDA : Exploratory Data Analysis

The dataset attached contains the data of 160 different bags associated with ABC industries. The bags have certain attributes which are described below: 1. Height – The height of the bag 2. Width – The width of the bag 3. Length – The length of the bag 4. Weight – The weight the bag can carry 5. Weight1 – Weight the bag can carry after expansion The company now wants to predict the cost they should set for a new variant of these kinds of bags.

data <- read.csv("R_220_Data_1.csv") dim(data) # returns the shape of the data, i.e the total number of rows,columns print(head(data)) # head() returns the top 6 rows of the dataframe summary(data) # returns the statistical summary of the data columns

Step 3 - Plot a scatter plot between x and y

plot(data$Width,data$Cost) #the plot() gives a visual representation of the relation between the variable Width and Cost cor(data$Width,data$Cost) # correlation between the two variables # the output gives a positive correlation , stating there is a high correlation between the two variables

Step 4 - Create a linear regression model

Here, a simple linear regression model is created with,

y(dependent variable) - Cost x(independent variable) - Width model <- lm(Cost ~ Width, data=data) summary(model)

Step 5 - Test for Heteroscedasticity

#create residual vs. fitted plot plot(fitted(model), resid(model), xlab='Fitted Values', ylab='Residuals') #add a horizontal line at 0 abline(0,0) #perform Breusch-Pagan test - to check for Heteroscedasticity bptest(model)

From the above plot and test, we conclude Heteroscedasticity is present. As, Null hypothesis - Homoscedasticity is present: residuals are distributed with equal variances. Alternate hypothesis — Heteroscedasticity is present: residuals are not distributed with equal variances. We obtain a p-value of 0.04898, hence we reject the null hypothesis and conclude that heteroscedasticity is present in the model.

Step 6 - Weighted Least Square Regression

#define weights to use weight <- 1 / lm(abs(model$residuals) ~ model$fitted.values)$fitted.values^2 #perform weighted least squares regression wls_model <- lm(data$Cost ~ data$Width, data = data, weights = weight) #view summary of model summary(wls_model)

The above weighted least square model concludes that the coefficient estimate "Cost" changed, and the fit of the model is improved indeed. The weighted least squared model gives a residual standard error (RSE) of 1.369, which is much better than that of a simple linear regression model which is 166.2. Which implies the predicted values are much closer to the actual values when fitted over a weighted least squares model compared to a simple regression model. The r-squared vale doesn't show much difference, in the weighted least squared model 0.7814 as in comparison to simple linear regression 0.7859. These two changes in performance metrics values in the two models conclude that weighted least square is better compared to simple linear regression model.

{"mode":"full","isActive":false}

What Users are saying..

profile image

Savvy Sahai

Data Science Intern, Capgemini
linkedin profile url

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More

Relevant Projects

Learn Object Tracking (SOT, MOT) using OpenCV and Python
Get Started with Object Tracking using OpenCV and Python - Learn to implement Multiple Instance Learning Tracker (MIL) algorithm, Generic Object Tracking Using Regression Networks Tracker (GOTURN) algorithm, Kernelized Correlation Filters Tracker (KCF) algorithm, Tracking, Learning, Detection Tracker (TLD) algorithm for single and multiple object tracking from various video clips.

Build a Multi Touch Attribution Machine Learning Model in Python
Identifying the ROI on marketing campaigns is an essential KPI for any business. In this ML project, you will learn to build a Multi Touch Attribution Model in Python to identify the ROI of various marketing efforts and their impact on conversions or sales..

MLOps AWS Project on Topic Modeling using Gunicorn Flask
In this project we will see the end-to-end machine learning development process to design, build and manage reproducible, testable, and evolvable machine learning models by using AWS

Build Deep Autoencoders Model for Anomaly Detection in Python
In this deep learning project , you will build and deploy a deep autoencoders model using Flask.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

Deploy Transformer-BART Model on Paperspace Cloud
In this MLOps Project you will learn how to deploy a Tranaformer BART Model for Abstractive Text Summarization on Paperspace Private Cloud

ML Model Deployment on AWS for Customer Churn Prediction
MLOps Project-Deploy Machine Learning Model to Production Python on AWS for Customer Churn Prediction

Forecasting Business KPI's with Tensorflow and Python
In this machine learning project, you will use the video clip of an IPL match played between CSK and RCB to forecast key performance indicators like the number of appearances of a brand logo, the frames, and the shortest and longest area percentage in the video.

Build a Similar Images Finder with Python, Keras, and Tensorflow
Build your own image similarity application using Python to search and find images of products that are similar to any given product. You will implement the K-Nearest Neighbor algorithm to find products with maximum similarity.

Skip Gram Model Python Implementation for Word Embeddings
Skip-Gram Model word2vec Example -Learn how to implement the skip gram algorithm in NLP for word embeddings on a set of documents.