How to implement Lasso regression in R

In this recipe, we shall learn how to implement lasso regression which is a regularization technique used for better accuracy in R.

Recipe Objective: How to implement Lasso regression in R?

Lasso regression is a regularisation technique preferred over other regression models for better providing accuracy.  Shrinkage is used in this model. Data values are shrunk towards a central point known as the mean in shrinkage, encouraging simple, sparse models, i.e., models with fewer parameters. The steps to implement lasso regression in R are as follows -

Step 1: Load the required packages

#importing required libraries
library(caret)
library(glmnet)
library(MASS)

Step 2: Load the dataset

Boston is an inbuilt dataset in R which contains Housing data for 506 census tracts of Boston from the 1970 census.
indus- the proportion of non-retail business acres per town
chas- Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
nox- nitric oxides concentration (parts per 10 million)
rm- the average number of rooms per dwelling
age- the proportion of owner-occupied units built before 1940
dis- weighted distances to five Boston employment centers
rad- index of accessibility to radial highways
tax- full-value property-tax rate per USD 10,000
ptratio pupil-teacher ratio by town
black- 1000(B - 0.63)^2 where B is the proportion of blacks by town
lstat- the percentage of the lower status of the population
medv- median value of owner-occupied homes in USD 1000's

#loading the dataset
data <- Boston
head(data)

     crim zn indus chas   nox    rm  age    dis rad tax
1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296
2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242
3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242
4 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222
5 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222
6 0.02985  0  2.18    0 0.458 6.430 58.7 6.0622   3 222
  ptratio  black lstat medv
1    15.3 396.90  4.98 24.0
2    17.8 396.90  9.14 21.6
3    17.8 392.83  4.03 34.7
4    18.7 394.63  2.94 33.4
5    18.7 396.90  5.33 36.2
6    18.7 394.12  5.21 28.7

Step 3: Check the structure of the dataset

#structure
head(data)
str(data)

'data.frame':	506 obs. of  14 variables:
 $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
 $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
 $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
 $ chas   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
 $ rm     : num  6.58 6.42 7.18 7 7.15 ...
 $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
 $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
 $ rad    : int  1 2 2 3 3 3 5 5 5 5 ...
 $ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
 $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
 $ black  : num  397 397 393 395 397 ...
 $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
 $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...

All the columns are int or numeric type

Step 4: Train-Test split

#train-test split
set.seed(222)
ind <- sample(2, nrow(data), replace = TRUE, prob = c(0.7, 0.3))
train <- data[ind==1,]
head(data)
test <- data[ind==2,]

Step 5: Create custom Control Parameters

#creating custom Control Parameters
custom <- trainControl(method = "repeatedcv",
number = 10,
repeats = 5,
verboseIter = TRUE)

Step 6: Model Fitting

#fitting Lasso Regression model
set.seed(1234)
lasso <- train(medv~.,train,
method="glmnet",
tuneGrid=expand.grid(alpha=1,
lambda=seq(0.0001,1,length=5)),
trControl=custom)
lasso

Output:
glmnet 

353 samples
 13 predictor

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 316, 318, 318, 319, 317, 318, ... 
Resampling results across tuning parameters:

  lambda    RMSE      Rsquared   MAE     
  0.000100  4.230700  0.7785841  3.025998
  0.250075  4.447615  0.7579974  3.135095
  0.500050  4.611916  0.7438984  3.285522
  0.750025  4.688806  0.7406668  3.362630
  1.000000  4.786658  0.7366188  3.445216

Tuning parameter 'alpha' was held constant at a value of 1
RMSE was used to select the optimal model using the
 smallest value.
The final values used for the model were alpha = 1 and
 lambda = 1e-04.

Step 7: Check RMSE value

#mean validation score
mean(lasso$resample$RMSE)

	[1] 4.2307

Step 8: Plots

- #plotting the model
plot(lasso, main = "Lasso Regression")
#plotting important variables
plot(varImp(lasso,scale=TRUE))

nox, rm, and this were the top three most essential variables.

What Users are saying..

profile image

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd
linkedin profile url

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain... Read More

Relevant Projects

MLOps Project on GCP using Kubeflow for Model Deployment
MLOps using Kubeflow on GCP - Build and deploy a deep learning model on Google Cloud Platform using Kubeflow pipelines in Python

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Tensorflow Transfer Learning Model for Image Classification
Image Classification Project - Build an Image Classification Model on a Dataset of T-Shirt Images for Binary Classification

AWS MLOps Project for ARCH and GARCH Time Series Models
Build and deploy ARCH and GARCH time series forecasting models in Python on AWS .

Deploy Transformer BART Model for Text summarization on GCP
Learn to Deploy a Machine Learning Model for the Abstractive Text Summarization on Google Cloud Platform (GCP)

End-to-End ML Model Monitoring using Airflow and Docker
In this MLOps Project, you will learn to build an end to end pipeline to monitor any changes in the predictive power of model or degradation of data.

BigMart Sales Prediction ML Project in Python
The goal of the BigMart Sales Prediction ML project is to build and evaluate different predictive models and determine the sales of each product at a store.

End-to-End Snowflake Healthcare Analytics Project on AWS-2
In this AWS Snowflake project, you will build an end to end retraining pipeline by checking Data and Model Drift and learn how to redeploy the model if needed

AWS MLOps Project to Deploy a Classification Model [Banking]
In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

Build Piecewise and Spline Regression Models in Python
In this Regression Project, you will learn how to build a piecewise and spline regression model from scratch in Python to predict the points scored by a sports team.