How to find optimal parameters using GridSearchCV for Regression in ML in python

This recipe helps you find optimal parameters using GridSearchCV for Regression in ML in python

Recipe Objective

Many a times while working on a dataset and using a Machine Learning model we don't know which set of hyperparameters will give us the best result. Passing all sets of hyperparameters manually through the model and checking the result might be a hectic work and may not be possible to do.

To get the best set of hyperparameters we can use Grid Search. Grid Search passes all combinations of hyperparameters one by one into the model and check the result. Finally it gives us the set of hyperparemeters which gives the best result after passing in the model.

This python source code does the following:
1. Imports the necessary libraries
2. Loads the dataset and performs train_test_split
3. Applies GradientBoostingClassifier and evaluates the result
4. Hyperparameter tunes the GBR Classifier model using GridSearchCV

So this recipe is a short example of how we can find optimal parameters using GridSearchCV for Regression?

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Step 1 - Import the library - GridSearchCv

from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.model_selection import GridSearchCV from sklearn.ensemble import GradientBoostingRegressor

Here we have imported various modules like datasets, GradientBoostingRegressor and GridSearchCV from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Setup the Data

Here we have used datasets to load the inbuilt diabetes dataset and we have created objects X and y to store the data and the target value respectively. dataset = datasets.load_diabetes() X = dataset.data; y = dataset.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)

Step 3 - Model and its Parameter

Here, we are using GradientBoostingRegressor as a Machine Learning model to use GridSearchCV. So we have created an object GBR. GBR = GradientBoostingRegressor() Now we have defined the parameters of the model which we want to pass to through GridSearchCV to get the best parameters. So we are making an dictionary called parameters in which we have four parameters learning_rate, subsample, n_estimators and max_depth. parameters = {'learning_rate': [0.01,0.02,0.03,0.04], 'subsample' : [0.9, 0.5, 0.2, 0.1], 'n_estimators' : [100,500,1000, 1500], 'max_depth' : [4,6,8,10] }

 

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 4 - Using GridSearchCV and Printing Results

Before using GridSearchCV, lets have a look on the important parameters.

  • estimator: In this we have to pass the models or functions on which we want to use GridSearchCV
  • param_grid: Dictionary or list of parameters of models or function in which GridSearchCV have to select the best.
  • Scoring: It is used as a evaluating metric for the model performance to decide the best hyperparameters, if not especified then it uses estimator score.
  • cv : In this we have to pass a interger value, as it signifies the number of splits that is needed for cross validation. By default is set as five.
  • n_jobs : This signifies the number of jobs to be run in parallel, -1 signifies to use all processor.

Making an object grid_GBR for GridSearchCV and fitting the dataset i.e X and y grid_GBR = GridSearchCV(estimator=GBR, param_grid = parameters, cv = 2, n_jobs=-1) grid_GBR.fit(X_train, y_train) Now we are using print statements to print the results. It will give the values of hyperparameters as a result. print(" Results from Grid Search " ) print("\n The best estimator across ALL searched params:\n",grid_GBR.best_estimator_) print("\n The best score across ALL searched params:\n",grid_GBR.best_score_) print("\n The best parameters across ALL searched params:\n",grid_GBR.best_params_) As an output we get:

Results from Grid Search 

 The best estimator across ALL searched params:
 GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.03, loss='ls', max_depth=10,
             max_features=None, max_leaf_nodes=None,
             min_impurity_decrease=0.0, min_impurity_split=None,
             min_samples_leaf=1, min_samples_split=2,
             min_weight_fraction_leaf=0.0, n_estimators=100,
             n_iter_no_change=None, presort='auto', random_state=None,
             subsample=0.1, tol=0.0001, validation_fraction=0.1, verbose=0,
             warm_start=False)

 The best score across ALL searched params:
 0.41652696934146743

 The best parameters across ALL searched params:
 {'learning_rate': 0.03, 'max_depth': 10, 'n_estimators': 100, 'subsample': 0.1}

Join Millions of Satisfied Developers and Enterprises to Maximize Your Productivity and ROI with ProjectPro - Read ProjectPro Reviews Now!

Download Materials

What Users are saying..

profile image

Savvy Sahai

Data Science Intern, Capgemini
linkedin profile url

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More

Relevant Projects

Azure Deep Learning-Deploy RNN CNN models for TimeSeries
In this Azure MLOps Project, you will learn to perform docker-based deployment of RNN and CNN Models for Time Series Forecasting on Azure Cloud.

End-to-End ML Model Monitoring using Airflow and Docker
In this MLOps Project, you will learn to build an end to end pipeline to monitor any changes in the predictive power of model or degradation of data.

Build Time Series Models for Gaussian Processes in Python
Time Series Project - A hands-on approach to Gaussian Processes for Time Series Modelling in Python

NLP and Deep Learning For Fake News Classification in Python
In this project you will use Python to implement various machine learning methods( RNN, LSTM, GRU) for fake news classification.

Deploying Machine Learning Models with Flask for Beginners
In this MLOps on GCP project you will learn to deploy a sales forecasting ML Model using Flask.

Llama2 Project for MetaData Generation using FAISS and RAGs
In this LLM Llama2 Project, you will automate metadata generation using Llama2, RAGs, and AWS to reduce manual efforts.

AWS MLOps Project to Deploy Multiple Linear Regression Model
Build and Deploy a Multiple Linear Regression Model in Python on AWS

OpenCV Project for Beginners to Learn Computer Vision Basics
In this OpenCV project, you will learn computer vision basics and the fundamentals of OpenCV library using Python.

Build Regression Models in Python for House Price Prediction
In this Machine Learning Regression project, you will build and evaluate various regression models in Python for house price prediction.

MLOps Project to Build Search Relevancy Algorithm with SBERT
In this MLOps SBERT project you will learn to build and deploy an accurate and scalable search algorithm on AWS using SBERT and ANNOY to enhance search relevancy in news articles.