How to find optimal parameters using RandomizedSearchCV for Regression?

How to find optimal parameters using RandomizedSearchCV for Regression?

How to find optimal parameters using RandomizedSearchCV for Regression?

This recipe helps you find optimal parameters using RandomizedSearchCV for Regression


Recipe Objective

So while training a model we need to pass few of the hyperparameters that effect the predictions of the model. But how find which set of hyperparameters gives the best result? This can be done by RandomizedSearchCV. RandomizedSearchCV randomly passes the set of hyperparameters and calculate the score and gives the best set of hyperparameters which gives the best score as an output.

This python source code does the following:
1. Imports the necessary libraries
2. Loads the dataset and performs train_test_split
3. Applies GradientBoostingClassifier and evaluates the result
4. Hyperparameter tunes the GBR Classifier model using RandomSearchCV

So this is the recipe on How we can find optimal parameters using RandomizedSearchCV for Regression.

Step 1 - Import the library

from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.model_selection import RandomizedSearchCV from sklearn.ensemble import GradientBoostingRegressor from scipy.stats import uniform as sp_randFloat from scipy.stats import randint as sp_randInt

We have imported various modules from differnt libraries such as datasets, train_test_split, RandomizedSearchCV, GradientBoostingRegressor, sp_randFloat and sp_randInt.

Step 2 - Setting up the Data

We are using the inbuilt diabetes dataset to train the model and we used train_test_split to split the data into two parts train and test. dataset = datasets.load_diabetes() X =; y = X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

Step 3 - Model and its parameters

Here we are using GradientBoostingRegressor as a model to train the data and setting its parameters(i.e. learning_rate, subsample, n_estimators and max_depth) for which we have to use RandomizedSearchCV to get the best set of parameters. model = GradientBoostingRegressor() parameters = {'learning_rate': sp_randFloat(), 'subsample' : sp_randFloat(), 'n_estimators' : sp_randInt(100, 1000), 'max_depth' : sp_randInt(4, 10) }

Step 4 - Using RandomizedSearchCV and Printing the results

Before using RandomizedSearchCV first look at its parameters:

  • estimator : In this we have to pass the metric or the model for which we need to optimize the parameters.
  • param_distributions : In this we have to pass the dictionary of parameters that we need to optimize.
  • cv : In this we have to pass a interger value, as it signifies the number of splits that is needed for cross validation. By default is set as five.
  • n_iter : This signifies the number of parameter settings that are sampled. By default it is set as 10.
  • n_jobs : This signifies the number of jobs to be run in parallel, -1 signifies to use all processor.
So we have defined an object to use RandomizedSearchCV with the important parameters. Then we have fitted the train data in it and finally with the print statements we can print the optimized values of hyperparameters. randm_src = RandomizedSearchCV(estimator=model, param_distributions = parameters, cv = 2, n_iter = 10, n_jobs=-1), y_train) print(" Results from Random Search " ) print("\n The best estimator across ALL searched params:\n", randm_src.best_estimator_) print("\n The best score across ALL searched params:\n", randm_src.best_score_) print("\n The best parameters across ALL searched params:\n", randm_src.best_params_) Output of this snippet is given below:

Results from Random Search 

 The best estimator across ALL searched params:
 GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.17889450760287762, loss='ls', max_depth=7,
             max_features=None, max_leaf_nodes=None,
             min_impurity_decrease=0.0, min_impurity_split=None,
             min_samples_leaf=1, min_samples_split=2,
             min_weight_fraction_leaf=0.0, n_estimators=737,
             n_iter_no_change=None, presort='auto', random_state=None,
             subsample=0.40247913722860207, tol=0.0001,
             validation_fraction=0.1, verbose=0, warm_start=False)

 The best score across ALL searched params:

 The best parameters across ALL searched params:
 {'learning_rate': 0.17889450760287762, 'max_depth': 7, 'n_estimators': 737, 'subsample': 0.40247913722860207}

Relevant Projects

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Human Activity Recognition Using Multiclass Classification in Python
In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.