How to implement Ridge regression in R

In this recipe, we shall learn how to use ridge regression in R. It is a model tuning technique that can be used to analyze data that consists of multicollinearity.

Recipe Objective: How to implement Ridge regression in R?

Ridge regression is a model tuning technique that can be used to analyze data that consists of multicollinearity. It uses the L2 regularization technique. When there is a problem with multicollinearity, least-squares are unbiased, and variances are high, the projected values are far from the actual values. Coefficients in the Ridge regression model are estimated using the ridge estimator, and the model is biased but has a lower variance than an OLS estimator. The steps to implement ridge regression in R are as follows- 

Learn How to use XLNet for Text Classification

Step 1: Load the required packages

#importing required packages
library(caret)
library(glmnet)
library(MASS)

Step 2: Load the dataset

Boston is an inbuilt dataset in R which contains Housing data for 506 census tracts of Boston from the 1970 census.
indus- the proportion of non-retail business acres per town
chas- Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
nox- nitric oxides concentration (parts per 10 million)
rm- the average number of rooms per dwelling
age- the proportion of owner-occupied units built before 1940
dis- weighted distances to five Boston employment centers
rad- index of accessibility to radial highways
tax- full-value property-tax rate per USD 10,000
ptratio pupil-teacher ratio by town
black- 1000(B - 0.63)^2 where B is the proportion of blacks by town
lstat- the percentage of the lower status of the population
medv- median value of owner-occupied homes in USD 1000's

#loading the dataset
data <- Boston
head(data)

     crim zn indus chas   nox    rm  age    dis rad tax
1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296
2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242
3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242
4 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222
5 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222
6 0.02985  0  2.18    0 0.458 6.430 58.7 6.0622   3 222
  ptratio  black lstat medv
1    15.3 396.90  4.98 24.0
2    17.8 396.90  9.14 21.6
3    17.8 392.83  4.03 34.7
4    18.7 394.63  2.94 33.4
5    18.7 396.90  5.33 36.2
6    18.7 394.12  5.21 28.7

Step 3: Check the structure of the dataset

#structure
head(data)
str(data)

'data.frame':	506 obs. of  14 variables:
 $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
 $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
 $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
 $ chas   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
 $ rm     : num  6.58 6.42 7.18 7 7.15 ...
 $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
 $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
 $ rad    : int  1 2 2 3 3 3 5 5 5 5 ...
 $ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
 $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
 $ black  : num  397 397 393 395 397 ...
 $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
 $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...

All the columns are int or numeric type

Step 4: Train-Test split

#train-test split
set.seed(222)
ind <- sample(2, nrow(data), replace = TRUE, prob = c(0.7, 0.3))
train <- data[ind==1,]
head(data)
test <- data[ind==2,]

Step 5: Create custom Control Parameters

#creating custom Control Parameters
custom <- trainControl(method = "repeatedcv",
number = 10,
repeats = 5,
verboseIter = TRUE)

Step 6: Model Fitting

#fitting Ridge Regression model
set.seed(1234)
ridge <- train(medv~.,train,
method="glmnet",
tuneGrid=expand.grid(alpha=0,
lambda=seq(0.0001,1,length=5)),
trControl=custom)
ridge

Output:
glmnet 

353 samples
13 predictor

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 316, 318, 318, 319, 317, 318, ... 
Resampling results across tuning parameters:

  lambda    RMSE      Rsquared   MAE     
  0.000100  4.242204  0.7782278  3.008339
  0.250075  4.242204  0.7782278  3.008339
  0.500050  4.242204  0.7782278  3.008339
  0.750025  4.248536  0.7779462  3.012397
  1.000000  4.265479  0.7770264  3.023091

Tuning parameter 'alpha' was held constant at a value of 0
RMSE was used to select the optimal model using the
 smallest value.
The final values used for the model were alpha = 0 and
 lambda = 0.50005.

Step 7: Check RMSE value

#mean validation score
mean(ridge$resample$RMSE)

[1] 4.242204

Step 8: Plots

#plotting the model
plot(ridge, main = "Ridge Regression")
#plotting important variables
plot(varImp(ridge,scale=TRUE))

nox, rm, and chas were the top three most important variables.

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

GCP MLOps Project to Deploy ARIMA Model using uWSGI Flask
Build an end-to-end MLOps Pipeline to deploy a Time Series ARIMA Model on GCP using uWSGI and Flask

Learn How to Build PyTorch Neural Networks from Scratch
In this deep learning project, you will learn how to build PyTorch neural networks from scratch.

Build an End-to-End AWS SageMaker Classification Model
MLOps on AWS SageMaker -Learn to Build an End-to-End Classification Model on SageMaker to predict a patient’s cause of death.

ML Model Deployment on AWS for Customer Churn Prediction
MLOps Project-Deploy Machine Learning Model to Production Python on AWS for Customer Churn Prediction

MLOps using Azure Devops to Deploy a Classification Model
In this MLOps Azure project, you will learn how to deploy a classification machine learning model to predict the customer's license status on Azure through scalable CI/CD ML pipelines.

Insurance Pricing Forecast Using XGBoost Regressor
In this project, we are going to talk about insurance forecast by using linear and xgboost regression techniques.

Build OCR from Scratch Python using YOLO and Tesseract
In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

Linear Regression Model Project in Python for Beginners Part 2
Machine Learning Linear Regression Project for Beginners in Python to Build a Multiple Linear Regression Model on Soccer Player Dataset.

Build Classification Algorithms for Digital Transformation[Banking]
Implement a machine learning approach using various classification techniques in Python to examine the digitalisation process of bank customers.

Deploying Machine Learning Models with Flask for Beginners
In this MLOps on GCP project you will learn to deploy a sales forecasting ML Model using Flask.