MACHINE LEARNING RECIPES
DATA CLEANING PYTHON
DATA MUNGING
PANDAS CHEATSHEET
ALL TAGS
# How to save a R model?

# How to save a R model?

This recipe helps you save a R model

Once we have trained a model and tested it's performance to be satisfactory, we should save the model. The trained model is lost as soon as we close the session. Additionally, with large dataset, training a model is quite time-consuming since you have to run the algorithm again and again. Hence, it is ideal to train and save the model which can be loaded later to predict the outcome on the new dataset.

In this recipe, we will demonstrate how to build and save a Regression Tree model.

Decision Tree is a supervised machine learning algorithm which can be used to perform both classification and regression on complex datasets. They are also known as Classification and Regression Trees (CART). Hence, it works for both continuous and categorical variables.

Important basic tree Terminology is as follows:

- Root node: represents an entire popuplation or dataset which gets divided into two or more pure sets (also known as homogeneuos steps). It always contains a single input variable (x).
- Leaf or terminal node: These nodes do not split further and contains the output variable

In this recipe, we will only focus on Regression Trees where the target variable is continuous in nature. The splits in these trees are based on minimising the Residual sum of squares of each groups formed. RSS is calculated by the predicted values is the mean response for the training observations within the jth group.

```
# For data manipulation
library(tidyverse)
# For Decision Tree algorithm
library(rpart)
# for plotting the decision Tree
install.packages("rpart.plot")
library(rpart.plot)
# Install readxl R package for reading excel sheets
install.packages("readxl")
library("readxl")
```

Loading the test and train dataset sepearately. Here Train and test are split in 80/20 proportion respectively.

Dataset description: The company wants to predict the cost they should set for a new variant of the kinds of bags based on the attributes mentioned below using the following variables:

- Height – The height of the bag
- Width – The width of the bag
- Length – The length of the bag
- Weight – The weight the bag can carry
- Weight1 – Weight the bag can carry after expansion

```
# calling the function read_excel from the readxl library
train = read_excel('R_253_df_train_regression.xlsx')
test = read_excel('R_253_df_test_regression.xlsx')
# gives the number of observations and variables involved with its brief description
glimpse(train)
```

Rows: 127 Columns: 6 $ Cost242, 290, 340, 363, 430, 450, 500, 390, 450, 500, 475, 500,... $ Weight 23.2, 24.0, 23.9, 26.3, 26.5, 26.8, 26.8, 27.6, 27.6, 28.5,... $ Weight1 25.4, 26.3, 26.5, 29.0, 29.0, 29.7, 29.7, 30.0, 30.0, 30.7,... $ Length 30.0, 31.2, 31.1, 33.5, 34.0, 34.7, 34.5, 35.0, 35.1, 36.2,... $ Height 11.5200, 12.4800, 12.3778, 12.7300, 12.4440, 13.6024, 14.17... $ Width 4.0200, 4.3056, 4.6961, 4.4555, 5.1340, 4.9274, 5.2785, 4.6...

```
# gives the number of observations and variables involved with its brief description
glimpse(test)
```

Rows: 32 Columns: 6 $ Cost1000.0, 200.0, 300.0, 300.0, 300.0, 430.0, 345.0, 456.0, 51... $ Weight 41.1, 30.0, 31.7, 32.7, 34.8, 35.5, 36.0, 40.0, 40.0, 40.1,... $ Weight1 44.0, 32.3, 34.0, 35.0, 37.3, 38.0, 38.5, 42.5, 42.5, 43.0,... $ Length 46.6, 34.8, 37.8, 38.8, 39.8, 40.5, 41.0, 45.5, 45.5, 45.8,... $ Height 12.4888, 5.5680, 5.7078, 5.9364, 6.2884, 7.2900, 6.3960, 7.... $ Width 7.5958, 3.3756, 4.1580, 4.3844, 4.0198, 4.5765, 3.9770, 4.3...

This is a pre-modelling step. In this step, the data must be scaled or standardised so that different attributes can be comparable. Standardised data has mean zero and standard deviation one. we do thiis using scale() function.

Note: Scaling is an important pre-modelling step which has to be mandatory

```
# scaling the independent variables in train dataset
train_scaled = scale(train[2:6])
# using cbind() function to add a new column Outcome to the scaled independent values
train_scaled = data.frame(cbind(train_scaled, Outcome = train$Cost))
train_scaled %>% head()
```

Weight Weight1 Length Height Width Outcome -0.33379271 -0.3132781 -0.08858827 0.4095324 -0.42466337 242 -0.22300101 -0.1970948 0.04945726 0.6459374 -0.22972408 290 -0.23684997 -0.1712763 0.03795346 0.6207701 0.03681581 340 0.09552513 0.1514550 0.31404453 0.7075012 -0.12740825 363 0.12322305 0.1514550 0.37156350 0.6370722 0.33570907 430 0.16476994 0.2418198 0.45209006 0.9223343 0.19469206 450

```
# scaling the independent variables in train dataset
test_scaled = scale(test[2:6])
# using cbind() function to add a new column Outcome to the scaled independent values
test_scaled = data.frame(cbind(test_scaled, Outcome = test$Cost))
test_scaled %>% head()
```

Weight Weight1 Length Height Width Outcome 0.72483012 0.72445274 0.69959684 2.15715925 1.87080937 1000 0.07204194 0.08459639 0.09077507 0.03471101 -0.06904068 200 0.17201851 0.17756697 0.24556027 0.07758442 0.29059599 300 0.23082825 0.23225555 0.29715533 0.14769072 0.39466263 300 0.35432872 0.35803927 0.34875040 0.25564092 0.22707121 300 0.39549554 0.39632128 0.38486694 0.56280832 0.48296300 430

We use rpart() function to fit the model.

Syntax: rpart(formula, data = , method = '')

Where:

- Formula of the Decision Trees: Outcome ~. where Outcome is dependent variable and . represents all other independent variables
- data = train_scaled
- method = 'anova' (to Fit a regression model)

```
# creation of an object 'model' using rpart function
model = rpart(Outcome~., data = train_scaled, method = 'anova')
```

Using rpart.plot() function to plot the decision tree model

```
rpart.plot(model)
```

There are two ways to save and load the model:

- using save(), load(): When we use save(), we will have to load it using the same name.
- using saveRDS(), loadRDS(): saveRDS() does not save the model name and we have the flexibilty to load the model in any other name. Bur saveRDS() can only save one object at a time as it is lower-level function.

Most people prefer saveRDS() over save() as it is serialise the object.

Syntax: saveRDS(model, file =)

where:

- model = model that you want to save
- file = path with the file extension .rda

```
# saving the model
saveRDS(model, file = "C:/Users/Divit/Desktop/Internship/R-recipes_Jan/R_160 onwards/Decision Tree_classifier/model.rda")
#loading the model
model_old = readRDS("C:/Users/Divit/Desktop/Internship/R-recipes_Jan/R_160 onwards/Decision Tree_classifier/model.rda")
#checking whether the model has been loaded with different name
ls()
```

'model' 'model_old' 'test' 'test_scaled' 'train' 'train_scaled' 'var_dic_list'

In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Use the Adult Income dataset to predict whether income exceeds 50K yr based on
census data.

In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.