How to implement Lasso regression in R

In this recipe, we shall learn how to implement lasso regression which is a regularization technique used for better accuracy in R.
Last Updated: 29 Jun 2022

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN R DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective: How to implement Lasso regression in R?

Lasso regression is a regularisation technique preferred over other regression models for better providing accuracy. Shrinkage is used in this model. Data values are shrunk towards a central point known as the mean in shrinkage, encouraging simple, sparse models, i.e., models with fewer parameters. The steps to implement lasso regression in R are as follows -

Recipe Objective: How to implement Lasso regression in R?

Step 1: Load the required packages

#importing required libraries library(caret) library(glmnet) library(MASS)

Step 2: Load the dataset

Boston is an inbuilt dataset in R which contains Housing data for 506 census tracts of Boston from the 1970 census.
indus- the proportion of non-retail business acres per town
chas- Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
nox- nitric oxides concentration (parts per 10 million)
rm- the average number of rooms per dwelling
age- the proportion of owner-occupied units built before 1940
dis- weighted distances to five Boston employment centers
rad- index of accessibility to radial highways
tax- full-value property-tax rate per USD 10,000
ptratio pupil-teacher ratio by town
black- 1000(B - 0.63)^2 where B is the proportion of blacks by town
lstat- the percentage of the lower status of the population
medv- median value of owner-occupied homes in USD 1000's

#loading the dataset data <- Boston head(data)

     crim zn indus chas   nox    rm  age    dis rad tax
1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296
2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242
3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242
4 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222
5 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222
6 0.02985  0  2.18    0 0.458 6.430 58.7 6.0622   3 222
  ptratio  black lstat medv
1    15.3 396.90  4.98 24.0
2    17.8 396.90  9.14 21.6
3    17.8 392.83  4.03 34.7
4    18.7 394.63  2.94 33.4
5    18.7 396.90  5.33 36.2
6    18.7 394.12  5.21 28.7

Step 3: Check the structure of the dataset

#structure head(data) str(data)

'data.frame':	506 obs. of  14 variables:
 $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
 $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
 $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
 $ chas   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
 $ rm     : num  6.58 6.42 7.18 7 7.15 ...
 $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
 $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
 $ rad    : int  1 2 2 3 3 3 5 5 5 5 ...
 $ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
 $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
 $ black  : num  397 397 393 395 397 ...
 $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
 $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...

All the columns are int or numeric type

Step 4: Train-Test split

#train-test split set.seed(222) ind <- sample(2, nrow(data), replace = TRUE, prob = c(0.7, 0.3)) train <- data[ind==1,] head(data) test <- data[ind==2,]

Step 5: Create custom Control Parameters

#creating custom Control Parameters custom <- trainControl(method = "repeatedcv", number = 10, repeats = 5, verboseIter = TRUE)

Step 6: Model Fitting

#fitting Lasso Regression model set.seed(1234) lasso <- train(medv~.,train, method="glmnet", tuneGrid=expand.grid(alpha=1, lambda=seq(0.0001,1,length=5)), trControl=custom) lasso

Output:
glmnet 

353 samples
 13 predictor

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 316, 318, 318, 319, 317, 318, ... 
Resampling results across tuning parameters:

  lambda    RMSE      Rsquared   MAE     
  0.000100  4.230700  0.7785841  3.025998
  0.250075  4.447615  0.7579974  3.135095
  0.500050  4.611916  0.7438984  3.285522
  0.750025  4.688806  0.7406668  3.362630
  1.000000  4.786658  0.7366188  3.445216

Tuning parameter 'alpha' was held constant at a value of 1
RMSE was used to select the optimal model using the
 smallest value.
The final values used for the model were alpha = 1 and
 lambda = 1e-04.

Step 7: Check RMSE value

#mean validation score mean(lasso$resample$RMSE)

	[1] 4.2307

Step 8: Plots

- #plotting the model plot(lasso, main = "Lasso Regression") #plotting important variables plot(varImp(lasso,scale=TRUE))

nox, rm, and this were the top three most essential variables.

What Users are saying..

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain... Read More