How to save a R model?

This recipe helps you save a R model

Recipe Objective

Once we have trained a model and tested it's performance to be satisfactory, we should save the model. The trained model is lost as soon as we close the session. Additionally, with large dataset, training a model is quite time-consuming since you have to run the algorithm again and again. Hence, it is ideal to train and save the model which can be loaded later to predict the outcome on the new dataset. ​

In this recipe, we will demonstrate how to build and save a Regression Tree model. ​

Learn About the Application of ARCH and GARCH models in Real-World

Decision Tree is a supervised machine learning algorithm which can be used to perform both classification and regression on complex datasets. They are also known as Classification and Regression Trees (CART). Hence, it works for both continuous and categorical variables.

Important basic tree Terminology is as follows: ​

  1. Root node: represents an entire popuplation or dataset which gets divided into two or more pure sets (also known as homogeneuos steps). It always contains a single input variable (x).
  2. Leaf or terminal node: These nodes do not split further and contains the output variable

In this recipe, we will only focus on Regression Trees where the target variable is continuous in nature. The splits in these trees are based on minimising the Residual sum of squares of each groups formed. RSS is calculated by the predicted values is the mean response for the training observations within the jth group. ​

STEP 1: Importing Necessary Libraries

# For data manipulation library(tidyverse) # For Decision Tree algorithm library(rpart) # for plotting the decision Tree install.packages("rpart.plot") library(rpart.plot) # Install readxl R package for reading excel sheets install.packages("readxl") library("readxl")

STEP 2: Loading the Train and Test Dataset

Loading the test and train dataset sepearately. Here Train and test are split in 80/20 proportion respectively.

Dataset description: The company wants to predict the cost they should set for a new variant of the kinds of bags based on the attributes mentioned below using the following variables: ​

  1. Height – The height of the bag
  2. Width – The width of the bag
  3. Length – The length of the bag
  4. Weight – The weight the bag can carry
  5. Weight1 – Weight the bag can carry after expansion

# calling the function read_excel from the readxl library train = read_excel('R_253_df_train_regression.xlsx') test = read_excel('R_253_df_test_regression.xlsx') # gives the number of observations and variables involved with its brief description glimpse(train)

Rows: 127
Columns: 6
$ Cost     242, 290, 340, 363, 430, 450, 500, 390, 450, 500, 475, 500,...
$ Weight   23.2, 24.0, 23.9, 26.3, 26.5, 26.8, 26.8, 27.6, 27.6, 28.5,...
$ Weight1  25.4, 26.3, 26.5, 29.0, 29.0, 29.7, 29.7, 30.0, 30.0, 30.7,...
$ Length   30.0, 31.2, 31.1, 33.5, 34.0, 34.7, 34.5, 35.0, 35.1, 36.2,...
$ Height   11.5200, 12.4800, 12.3778, 12.7300, 12.4440, 13.6024, 14.17...
$ Width    4.0200, 4.3056, 4.6961, 4.4555, 5.1340, 4.9274, 5.2785, 4.6...

# gives the number of observations and variables involved with its brief description glimpse(test)

Rows: 32
Columns: 6
$ Cost     1000.0, 200.0, 300.0, 300.0, 300.0, 430.0, 345.0, 456.0, 51...
$ Weight   41.1, 30.0, 31.7, 32.7, 34.8, 35.5, 36.0, 40.0, 40.0, 40.1,...
$ Weight1  44.0, 32.3, 34.0, 35.0, 37.3, 38.0, 38.5, 42.5, 42.5, 43.0,...
$ Length   46.6, 34.8, 37.8, 38.8, 39.8, 40.5, 41.0, 45.5, 45.5, 45.8,...
$ Height   12.4888, 5.5680, 5.7078, 5.9364, 6.2884, 7.2900, 6.3960, 7....
$ Width    7.5958, 3.3756, 4.1580, 4.3844, 4.0198, 4.5765, 3.9770, 4.3...

STEP 3: Data Preprocessing (Scaling)

This is a pre-modelling step. In this step, the data must be scaled or standardised so that different attributes can be comparable. Standardised data has mean zero and standard deviation one. we do thiis using scale() function.

Note: Scaling is an important pre-modelling step which has to be mandatory

# scaling the independent variables in train dataset train_scaled = scale(train[2:6]) # using cbind() function to add a new column Outcome to the scaled independent values train_scaled = data.frame(cbind(train_scaled, Outcome = train$Cost)) train_scaled %>% head()

Weight		Weight1		Length		Height		Width		Outcome
-0.33379271	-0.3132781	-0.08858827	0.4095324	-0.42466337	242
-0.22300101	-0.1970948	0.04945726	0.6459374	-0.22972408	290
-0.23684997	-0.1712763	0.03795346	0.6207701	0.03681581	340
0.09552513	0.1514550	0.31404453	0.7075012	-0.12740825	363
0.12322305	0.1514550	0.37156350	0.6370722	0.33570907	430
0.16476994	0.2418198	0.45209006	0.9223343	0.19469206	450

# scaling the independent variables in train dataset test_scaled = scale(test[2:6]) # using cbind() function to add a new column Outcome to the scaled independent values test_scaled = data.frame(cbind(test_scaled, Outcome = test$Cost)) test_scaled %>% head()

Weight		Weight1		Length		Height		Width		Outcome
0.72483012	0.72445274	0.69959684	2.15715925	1.87080937	1000
0.07204194	0.08459639	0.09077507	0.03471101	-0.06904068	200
0.17201851	0.17756697	0.24556027	0.07758442	0.29059599	300
0.23082825	0.23225555	0.29715533	0.14769072	0.39466263	300
0.35432872	0.35803927	0.34875040	0.25564092	0.22707121	300
0.39549554	0.39632128	0.38486694	0.56280832	0.48296300	430

STEP 4: Creation of Decision Tree Regressor model using training set

We use rpart() function to fit the model.

Syntax: rpart(formula, data = , method = '')

Where:

  1. Formula of the Decision Trees: Outcome ~. where Outcome is dependent variable and . represents all other independent variables
  2. data = train_scaled
  3. method = 'anova' (to Fit a regression model)

# creation of an object 'model' using rpart function model = rpart(Outcome~., data = train_scaled, method = 'anova')

Using rpart.plot() function to plot the decision tree model

rpart.plot(model)

STEP 5: Saving the model

There are two ways to save and load the model:

  1. using save(), load(): When we use save(), we will have to load it using the same name.
  2. using saveRDS(), loadRDS(): saveRDS() does not save the model name and we have the flexibilty to load the model in any other name. Bur saveRDS() can only save one object at a time as it is lower-level function.

Most people prefer saveRDS() over save() as it is serialise the object.

Syntax: saveRDS(model, file =)

where:

  1. model = model that you want to save
  2. file = path with the file extension .rda

# saving the model saveRDS(model, file = "C:/Users/Divit/Desktop/Internship/R-recipes_Jan/R_160 onwards/Decision Tree_classifier/model.rda") #loading the model model_old = readRDS("C:/Users/Divit/Desktop/Internship/R-recipes_Jan/R_160 onwards/Decision Tree_classifier/model.rda") #checking whether the model has been loaded with different name ls()

'model' 'model_old' 'test' 'test_scaled' 'train' 'train_scaled' 'var_dic_list'  
​

What Users are saying..

profile image

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd
linkedin profile url

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain... Read More

Relevant Projects

Abstractive Text Summarization using Transformers-BART Model
Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.

Loan Eligibility Prediction Project using Machine learning on GCP
Loan Eligibility Prediction Project - Use SQL and Python to build a predictive model on GCP to determine whether an application requesting loan is eligible or not.

Deep Learning Project for Beginners with Source Code Part 1
Learn to implement deep neural networks in Python .

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Build an Image Segmentation Model using Amazon SageMaker
In this Machine Learning Project, you will learn to implement the UNet Architecture and build an Image Segmentation Model using Amazon SageMaker

Build a CNN Model with PyTorch for Image Classification
In this deep learning project, you will learn how to build an Image Classification Model using PyTorch CNN

Recommender System Machine Learning Project for Beginners-1
Recommender System Machine Learning Project for Beginners - Learn how to design, implement and train a rule-based recommender system in Python

Hands-On Approach to Regression Discontinuity Design Python
In this machine learning project, you will learn to implement Regression Discontinuity Design Example in Python to determine the effect of age on Mortality Rate in Python.

Time Series Forecasting Project-Building ARIMA Model in Python
Build a time series ARIMA model in Python to forecast the use of arrival rate density to support staffing decisions at call centres.

End-to-End ML Model Monitoring using Airflow and Docker
In this MLOps Project, you will learn to build an end to end pipeline to monitor any changes in the predictive power of model or degradation of data.