RandomizedSearchCV to find Optimal Parameters in Python

In this Recipe we will learn how to find the optimal parameters using RandomizedSearchCV and how to apply GradientBoostingClassifier for result evaluation in Python. The hyperparameters tunning is also explained in this.

Recipe Objective

So while training a model we need to pass few of the hyperparameters that effect the predictions of the model. But how find which set of hyperparameters gives the best result? This can be done by RandomizedSearchCV. RandomizedSearchCV randomly passes the set of hyperparameters and calculate the score and gives the best set of hyperparameters which gives the best score as an output.

This python source code does the following:
1. Imports the necessary libraries
2. Loads the dataset and performs train_test_split
3. Applies GradientBoostingClassifier and evaluates the result
4. Hyperparameter tunes the GBR Classifier model using RandomSearchCV

So this is the recipe on How we can find optimal parameters using RandomizedSearchCV for Regression.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Step 1 - Import the library

from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.model_selection import RandomizedSearchCV from sklearn.ensemble import GradientBoostingRegressor from scipy.stats import uniform as sp_randFloat from scipy.stats import randint as sp_randInt

We have imported various modules from differnt libraries such as datasets, train_test_split, RandomizedSearchCV, GradientBoostingRegressor, sp_randFloat and sp_randInt.

Step 2 - Setting up the Data

We are using the inbuilt diabetes dataset to train the model and we used train_test_split to split the data into two parts train and test. dataset = datasets.load_diabetes() X = dataset.data; y = dataset.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

Step 3 - Model and its parameters

Here we are using GradientBoostingRegressor as a model to train the data and setting its parameters(i.e. learning_rate, subsample, n_estimators and max_depth) for which we have to use RandomizedSearchCV to get the best set of parameters. model = GradientBoostingRegressor() parameters = {'learning_rate': sp_randFloat(), 'subsample' : sp_randFloat(), 'n_estimators' : sp_randInt(100, 1000), 'max_depth' : sp_randInt(4, 10) }

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 4 - Using RandomizedSearchCV and Printing the results

Before using RandomizedSearchCV first look at its parameters:

  • estimator : In this we have to pass the metric or the model for which we need to optimize the parameters.
  • param_distributions : In this we have to pass the dictionary of parameters that we need to optimize.
  • cv : In this we have to pass a interger value, as it signifies the number of splits that is needed for cross validation. By default is set as five.
  • n_iter : This signifies the number of parameter settings that are sampled. By default it is set as 10.
  • n_jobs : This signifies the number of jobs to be run in parallel, -1 signifies to use all processor.

So we have defined an object to use RandomizedSearchCV with the important parameters. Then we have fitted the train data in it and finally with the print statements we can print the optimized values of hyperparameters. randm_src = RandomizedSearchCV(estimator=model, param_distributions = parameters, cv = 2, n_iter = 10, n_jobs=-1) randm_src.fit(X_train, y_train) print(" Results from Random Search " ) print("\n The best estimator across ALL searched params:\n", randm_src.best_estimator_) print("\n The best score across ALL searched params:\n", randm_src.best_score_) print("\n The best parameters across ALL searched params:\n", randm_src.best_params_) Output of this snippet is given below:

Results from Random Search 

 The best estimator across ALL searched params:
 GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.17889450760287762, loss='ls', max_depth=7,
             max_features=None, max_leaf_nodes=None,
             min_impurity_decrease=0.0, min_impurity_split=None,
             min_samples_leaf=1, min_samples_split=2,
             min_weight_fraction_leaf=0.0, n_estimators=737,
             n_iter_no_change=None, presort='auto', random_state=None,
             subsample=0.40247913722860207, tol=0.0001,
             validation_fraction=0.1, verbose=0, warm_start=False)

 The best score across ALL searched params:
 0.23754616011566576

 The best parameters across ALL searched params:
 {'learning_rate': 0.17889450760287762, 'max_depth': 7, 'n_estimators': 737, 'subsample': 0.40247913722860207}

Join Millions of Satisfied Developers and Enterprises to Maximize Your Productivity and ROI with ProjectPro - Read ProjectPro Reviews Now!

Download Materials

What Users are saying..

profile image

Ed Godalle

Director Data Analytics at EY / EY Tech
linkedin profile url

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

LLM Project to Build and Fine Tune a Large Language Model
In this LLM project for beginners, you will learn to build a knowledge-grounded chatbot using LLM's and learn how to fine tune it.

End-to-End Snowflake Healthcare Analytics Project on AWS-2
In this AWS Snowflake project, you will build an end to end retraining pipeline by checking Data and Model Drift and learn how to redeploy the model if needed

Build a Review Classification Model using Gated Recurrent Unit
In this Machine Learning project, you will build a classification model in python to classify the reviews of an app on a scale of 1 to 5 using Gated Recurrent Unit.

Learn How to Build PyTorch Neural Networks from Scratch
In this deep learning project, you will learn how to build PyTorch neural networks from scratch.

Loan Eligibility Prediction Project using Machine learning on GCP
Loan Eligibility Prediction Project - Use SQL and Python to build a predictive model on GCP to determine whether an application requesting loan is eligible or not.

Expedia Hotel Recommendations Data Science Project
In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

Deep Learning Project for Text Detection in Images using Python
CV2 Text Detection Code for Images using Python -Build a CRNN deep learning model to predict the single-line text in a given image.

Word2Vec and FastText Word Embedding with Gensim in Python
In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

MLOps Project on GCP using Kubeflow for Model Deployment
MLOps using Kubeflow on GCP - Build and deploy a deep learning model on Google Cloud Platform using Kubeflow pipelines in Python