How to tune Hyper parameters using Random Search in R?

This recipe helps you tune Hyper parameters using Random Search in R
Last Updated: 27 Jun 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

When we train a model, the best parameters are determined for each independent variable. For example, in Linear reggression modelling, the coefficients of each independent variable is considered as a parameter i.e. they are found during the training process.

On the hand, Hyperparameters are are set by the user before training and are independent of the training process. For example, depth of a Decision Tree. These hyper parameters affects the performance as well as the parameters of the model. Hence, they need to be optimised. There are two ways to carry out Hyperparameter tuning:

Grid Search: This technique generates evenly spaced values for each hyperparameters and then uses Cross validation to find the optimum values.
Random Search: This technique generates random values for each hyperparameter being tested and then uses Cross validation to find the optimum values.

In this recipe, we will discuss how to build and optimise size of the tree in XGBoost using hyperparameter tuning using Grid Search.

Recently, researchers and enthusiasts have started using ensemble techniques like XGBoost to win data science competitions and hackathons. It outperforms algorithms such as Random Forest and Gadient Boosting in terms of speed as well as accuracy when performed on structured data.

XGBoost uses ensemble model which is based on Decision tree. A simple decision tree is considered to be a weak learner. The algorithm build sequential decision trees were each tree corrects the error occuring in the previous one until a condition is met.

Recipe Objective

STEP 1: Importing Necessary Libraries

install.packages('xgboost') # for fitting the xgboost model install.packages('caret') # for general data preparation and model fitting library(xgboost) library(caret) library(tidyverse) # for data manipulation

STEP 2: Read a csv file and explore the data

The dataset attached contains the data of 160 different bags associated with ABC industries.

The bags have certain attributes which are described below:

Height – The height of the bag
Width – The width of the bag
Length – The length of the bag
Weight – The weight the bag can carry
Weight1 – Weight the bag can carry after expansion

The company now wants to predict the cost they should set for a new variant of these kinds of bags.

data <- read.csv("R_338_Data_1.csv") glimpse(data)

Rows: 159
Columns: 6
$ Cost     242, 290, 340, 363, 430, 450, 500, 390, 450, 500, 475, 500,...
$ Weight   23.2, 24.0, 23.9, 26.3, 26.5, 26.8, 26.8, 27.6, 27.6, 28.5,...
$ Weight1  25.4, 26.3, 26.5, 29.0, 29.0, 29.7, 29.7, 30.0, 30.0, 30.7,...
$ Length   30.0, 31.2, 31.1, 33.5, 34.0, 34.7, 34.5, 35.0, 35.1, 36.2,...
$ Height   11.5200, 12.4800, 12.3778, 12.7300, 12.4440, 13.6024, 14.17...
$ Width    4.0200, 4.3056, 4.6961, 4.4555, 5.1340, 4.9274, 5.2785, 4.6...

summary(data) # returns the statistical summary of the data columns

Cost            Weight         Weight1          Length     
 Min.   :   0.0   Min.   : 7.50   Min.   : 8.40   Min.   : 8.80  
 1st Qu.: 120.0   1st Qu.:19.05   1st Qu.:21.00   1st Qu.:23.15  
 Median : 273.0   Median :25.20   Median :27.30   Median :29.40  
 Mean   : 398.3   Mean   :26.25   Mean   :28.42   Mean   :31.23  
 3rd Qu.: 650.0   3rd Qu.:32.70   3rd Qu.:35.50   3rd Qu.:39.65  
 Max.   :1650.0   Max.   :59.00   Max.   :63.40   Max.   :68.00  
     Height           Width      
 Min.   : 1.728   Min.   :1.048  
 1st Qu.: 5.945   1st Qu.:3.386  
 Median : 7.786   Median :4.248  
 Mean   : 8.971   Mean   :4.417  
 3rd Qu.:12.366   3rd Qu.:5.585  
 Max.   :18.957   Max.   :8.142

dim(data)

159 6

STEP 3: Train Test Split

# createDataPartition() function from the caret package to split the original dataset into a training and testing set and split data into training (80%) and testing set (20%) parts = createDataPartition(data$Cost, p = .8, list = F) train = data[parts, ] test = data[-parts, ]

STEP 4: Building and optimising xgboost model using Hyperparameter tuning (Random Search)

We will use caret package to perform Cross Validation and Hyperparameter tuning (nround- Number of trees and max_depth) using random search technique. First, we will use the trainControl() function to define the method of cross validation to be carried out and search type i.e. "grid" or "random". Then train the model using train() function with tuneGrid as one of the arguements.

Syntax: train(formula, data = , method = , trControl = , tuneGrid = )

where:

formula = y~x1+x2+x3+..., where y is the independent variable and x1,x2,x3 are the dependent variables
data = dataframe
method = Type of the model to be built
trControl = Takes the control parameters. We will use trainControl function out here where we will specify the Cross validation technique.
tuneGrid = takes the tuning parameters and applies grid search CV on them

# specifying the CV technique which will be passed into the train() function later and number parameter is the "k" in K-fold cross validation train_control = trainControl(method = "cv", number = 5, search = "random") set.seed(50) # training a XGboost Regression tree model while tuning parameters model = train(Cost~., data = train, method = "xgbTree", trControl = train_control) # summarising the results print(model)

129 samples
  5 predictor

No pre-processing
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 103, 104, 103, 103, 103 
Resampling results across tuning parameters:

  eta        max_depth  gamma      colsample_bytree  min_child_weight
  0.0654172  2          6.4088615  0.5704402         4               
  0.1625897  3          2.7728954  0.4458249         6               
  0.2349444  7          0.7755953  0.6341465         1               
  subsample  nrounds  RMSE      Rsquared   MAE     
  0.4817636  820      74.39177  0.9624757  45.26924
  0.6831931  863      93.51286  0.9422944  59.56662
  0.7376186   95      59.34363  0.9738879  39.24296

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were nrounds = 95, max_depth = 7, eta
 = 0.2349444, gamma = 0.7755953, colsample_bytree = 0.6341465,
 min_child_weight = 1 and subsample = 0.7376186.

Note: RMSE was used select the optimal model using the smallest value. And the final model consists of 95 trees and depth of 7.

STEP 5: Make predictions on the final xgboost model

We use our final xgboost model to make predictions on the testing data (unseen data) and predict the 'Cost' value and generate performance measures.

#use model to make predictions on test data pred_y = predict(model, test) # performance metrics on the test data test_y = test[, 1] mean((test_y - pred_y)^2) #mse - Mean Squared Error caret::RMSE(test_y, pred_y) #rmse - Root Mean Squared Error

3089.03165508245
55.5790577023617

What Users are saying..

Ed Godalle

Director Data Analytics at EY / EY Tech

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Word2Vec and FastText Word Embedding with Gensim in Python

In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

View Project Details

House Price Prediction Project using Machine Learning in Python

Use the Zillow Zestimate Dataset to build a machine learning model for house price prediction.

View Project Details

NLP Project to Build a Resume Parser in Python using Spacy

Use the popular Spacy NLP python library for OCR and text classification to build a Resume Parser in Python.

View Project Details

Avocado Machine Learning Project Python for Price Prediction

In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.

View Project Details

Hands-On Approach to Causal Inference in Machine Learning

In this Machine Learning Project, you will learn to implement various causal inference techniques in Python to determine, how effective the sprinkler is in making the grass wet.

View Project Details

Build a Logistic Regression Model in Python from Scratch

Regression project to implement logistic regression in python from scratch on streaming app data.

View Project Details

End-to-End Speech Emotion Recognition Project using ANN

Speech Emotion Recognition using RAVDESS Audio Dataset - Build an Artificial Neural Network Model to Classify Audio Data into various Emotions like Sad, Happy, Angry, and Neutral

View Project Details

Build Customer Propensity to Purchase Model in Python

In this machine learning project, you will learn to build a machine learning model to estimate customer propensity to purchase.

View Project Details

A/B Testing Approach for Comparing Performance of ML Models

The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

View Project Details

Insurance Pricing Forecast Using XGBoost Regressor

In this project, we are going to talk about insurance forecast by using linear and xgboost regression techniques.

View Project Details

How to tune Hyper parameters using Random Search in R?

Recipe Objective

Table of Contents

STEP 1: Importing Necessary Libraries

STEP 2: Read a csv file and explore the data

STEP 3: Train Test Split

STEP 4: Building and optimising xgboost model using Hyperparameter tuning (Random Search)

STEP 5: Make predictions on the final xgboost model

Ed Godalle

Relevant Projects

You might also like

Relevant Projects