How to use SVM Regressor in R?

This recipe helps you use SVM Regressor in R

Recipe Objective

Support Vector Machines is a supervised learning algorithm which can work for both classification and regression problems. The main objective of the SVM is to find the optimum hyperplane (i.e. 2D line and 3D plane) that maximises the margin (i.e. twice the distance between the closest data point and hyperplane) between two classes.

It applies a penalty for misclassification and transforms the data to higher dimension if it is non-linearly separable. ​

Major advantages of using SVM are that: ​

  1. it works well with large number of predictors.
  2. Works well in-case of non-linear separable data.
  3. Works well in-case of image classification and does not suffer multicollinearity problem.

This recipe demonstrates the modelling of a SVM Regressor by using the sales data of bags. ​

Access Retail Price Recommendation ML Project with Source Code

STEP 1: Importing Necessary Libraries

library(caret) library(tidyverse) # for data manipulation

STEP 2: Read a csv file and explore the data

The dataset attached contains the data of 160 different bags associated with ABC industries.

The bags have certain attributes which are described below: ​

  1. Height – The height of the bag
  2. Width – The width of the bag
  3. Length – The length of the bag
  4. Weight – The weight the bag can carry
  5. Weight1 – Weight the bag can carry after expansion

The company now wants to predict the cost they should set for a new variant of these kinds of bags. ​

data <- read.csv("R_366_Data_1.csv") glimpse(data)

Rows: 159
Columns: 6
$ Cost     242, 290, 340, 363, 430, 450, 500, 390, 450, 500, 475, 500,...
$ Weight   23.2, 24.0, 23.9, 26.3, 26.5, 26.8, 26.8, 27.6, 27.6, 28.5,...
$ Weight1  25.4, 26.3, 26.5, 29.0, 29.0, 29.7, 29.7, 30.0, 30.0, 30.7,...
$ Length   30.0, 31.2, 31.1, 33.5, 34.0, 34.7, 34.5, 35.0, 35.1, 36.2,...
$ Height   11.5200, 12.4800, 12.3778, 12.7300, 12.4440, 13.6024, 14.17...
$ Width    4.0200, 4.3056, 4.6961, 4.4555, 5.1340, 4.9274, 5.2785, 4.6...

summary(data) # returns the statistical summary of the data columns

Cost            Weight         Weight1          Length     
 Min.   :   0.0   Min.   : 7.50   Min.   : 8.40   Min.   : 8.80  
 1st Qu.: 120.0   1st Qu.:19.05   1st Qu.:21.00   1st Qu.:23.15  
 Median : 273.0   Median :25.20   Median :27.30   Median :29.40  
 Mean   : 398.3   Mean   :26.25   Mean   :28.42   Mean   :31.23  
 3rd Qu.: 650.0   3rd Qu.:32.70   3rd Qu.:35.50   3rd Qu.:39.65  
 Max.   :1650.0   Max.   :59.00   Max.   :63.40   Max.   :68.00  
     Height           Width      
 Min.   : 1.728   Min.   :1.048  
 1st Qu.: 5.945   1st Qu.:3.386  
 Median : 7.786   Median :4.248  
 Mean   : 8.971   Mean   :4.417  
 3rd Qu.:12.366   3rd Qu.:5.585  
 Max.   :18.957   Max.   :8.142   

dim(data)

159 6

STEP 3: Train Test Split

# createDataPartition() function from the caret package to split the original dataset into a training and testing set and split data into training (80%) and testing set (20%) parts = createDataPartition(data$Cost, p = .8, list = F) train = data[parts, ] test = data[-parts, ]

STEP 4: Building SVM Regressor model

We will use caret package to perform SVM regression. First, we will use the trainControl() function to define the method of cross validation to be carried out and search type i.e. "grid". Then train the model using train() function.

Syntax: train(formula, data = , method = , trControl = , tuneGrid = )

where:

  1. formula = y~x1+x2+x3+..., where y is the independent variable and x1,x2,x3 are the dependent variables
  2. data = dataframe
  3. method = Type of the model to be built ("svmLinear" for SVM)
  4. trControl = Takes the control parameters. We will use trainControl function out here where we will specify the Cross validation technique.
  5. tuneGrid = takes the tuning parameters and applies grid search CV on them

# specifying the CV technique which will be passed into the train() function later and number parameter is the "k" in K-fold cross validation train_control = trainControl(method = "cv", number = 5) set.seed(50) # training a Regression model while tuning parameters (Method = "rpart") model = train(Cost~., data = train, method = "svmLinear", trControl = train_control) # summarising the results print(model)

Support Vector Machines with Linear Kernel 

129 samples
  5 predictor

No pre-processing
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 103, 104, 103, 103, 103 
Resampling results:

  RMSE      Rsquared   MAE     
  127.5993  0.8777133  88.49929

Tuning parameter 'C' was held constant at a value of 1

STEP 5: Make predictions on the SVM regressor model

We use our final SVM regressor model to make predictions on the testing data (unseen data) and predict the 'Cost' value and generate performance measures.

#use model to make predictions on test data pred_y = predict(model, test) # performance metrics on the test data test_y = test[, 1] mean((test_y - pred_y)^2) #mse - Mean Squared Error caret::RMSE(test_y, pred_y) #rmse - Root Mean Squared Error

24698.8516359459
157.158682979802

What Users are saying..

profile image

Ed Godalle

Director Data Analytics at EY / EY Tech
linkedin profile url

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Time Series Project to Build a Multiple Linear Regression Model
Learn to build a Multiple linear regression model in Python on Time Series Data

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Build a Customer Churn Prediction Model using Decision Trees
Develop a customer churn prediction model using decision tree machine learning algorithms and data science on streaming service data.

ML Model Deployment on AWS for Customer Churn Prediction
MLOps Project-Deploy Machine Learning Model to Production Python on AWS for Customer Churn Prediction

AWS MLOps Project to Deploy a Classification Model [Banking]
In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

AWS Project to Build and Deploy LSTM Model with Sagemaker
In this AWS Sagemaker Project, you will learn to build a LSTM model on Sagemaker for sales forecasting while analyzing the impact of weather conditions on Sales.

Avocado Machine Learning Project Python for Price Prediction
In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.

Multilabel Classification Project for Predicting Shipment Modes
Multilabel Classification Project to build a machine learning model that predicts the appropriate mode of transport for each shipment, using a transport dataset with 2000 unique products. The project explores and compares four different approaches to multilabel classification, including naive independent models, classifier chains, natively multilabel models, and multilabel to multiclass approaches.

Build an Image Segmentation Model using Amazon SageMaker
In this Machine Learning Project, you will learn to implement the UNet Architecture and build an Image Segmentation Model using Amazon SageMaker

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.