How to find optimal parameters using GridSearchCV in ML in python

This recipe helps you find optimal parameters using GridSearchCV in ML in python

Recipe Objective

Many a times while working on a dataset and using a Machine Learning model we don't know which set of hyperparameters will give us the best result. Passing all sets of hyperparameters manually through the model and checking the result might be a hectic work and may not be possible to do.

To get the best set of hyperparameters we can use Grid Search. Grid Search passes all combinations of hyperparameters one by one into the model and check the result. Finally it gives us the set of hyperparemeters which gives the best result after passing in the model.

This python source code does the following:
1. Imports the necessary libraries
2. Loads the dataset and performs train_test_split
3. Applies GradientBoostingClassifier and evaluates the result
4. Hyperparameter tunes the GBR Classifier model using GridSearchCV

So this recipe is a short example of how we can find optimal parameters using GridSearchCV.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Step 1 - Import the library - GridSearchCv

from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.model_selection import GridSearchCV from sklearn.ensemble import GradientBoostingClassifier

Here we have imported various modules like datasets, GradientBoostingClassifier and GridSearchCV from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Setup the Data

Here we have used datasets to load the inbuilt wine dataset and we have created objects X and y to store the data and the target value respectively. dataset = datasets.load_wine() X = dataset.data; y = dataset.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)

Step 3 - Model and its Parameter

Here, we are using GradientBoostingClassifier as a Machine Learning model to use GridSearchCV. So we have created an object GBC. GBC = GradientBoostingClassifier() Now we have defined the parameters of the model which we want to pass to through GridSearchCV to get the best parameters. So we are making an dictionary called parameters in which we have four parameters learning_rate, subsample, n_estimators and max_depth. parameters = {'learning_rate': [0.01,0.02,0.03], 'subsample' : [0.9, 0.5, 0.2], 'n_estimators' : [100,500,1000], 'max_depth' : [4,6,8] }

 

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 4 - Using GridSearchCV and Printing Results

Before using GridSearchCV, lets have a look on the important parameters.

  • estimator: In this we have to pass the models or functions on which we want to use GridSearchCV
  • param_grid: Dictionary or list of parameters of models or function in which GridSearchCV have to select the best.
  • Scoring: It is used as a evaluating metric for the model performance to decide the best hyperparameters, if not especified then it uses estimator score.
  • cv : In this we have to pass a interger value, as it signifies the number of splits that is needed for cross validation. By default is set as five.
  • n_jobs : This signifies the number of jobs to be run in parallel, -1 signifies to use all processor.

Making an object grid_GBC for GridSearchCV and fitting the dataset i.e X and y grid_GBC = GridSearchCV(estimator=GBR, param_grid = parameters, cv = 2, n_jobs=-1) grid_GBC.fit(X_train, y_train) Now we are using print statements to print the results. It will give the values of hyperparameters as a result. print(" Results from Grid Search " ) print("\n The best estimator across ALL searched params:\n",grid_GBC.best_estimator_) print("\n The best score across ALL searched params:\n",grid_GBC.best_score_) print("\n The best parameters across ALL searched params:\n",grid_GBC.best_params_) As an output we get:

The best estimator across ALL searched params:
 GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=0.01, loss='deviance', max_depth=8,
              max_features=None, max_leaf_nodes=None,
              min_impurity_decrease=0.0, min_impurity_split=None,
              min_samples_leaf=1, min_samples_split=2,
              min_weight_fraction_leaf=0.0, n_estimators=500,
              n_iter_no_change=None, presort='auto', random_state=None,
              subsample=0.2, tol=0.0001, validation_fraction=0.1,
              verbose=0, warm_start=False)

 The best score across ALL searched params:
 0.9758064516129032

 The best parameters across ALL searched params:
 {'learning_rate': 0.01, 'max_depth': 8, 'n_estimators': 500, 'subsample': 0.2}

Join Millions of Satisfied Developers and Enterprises to Maximize Your Productivity and ROI with ProjectPro - Read ProjectPro Reviews Now!

Download Materials

What Users are saying..

profile image

Ameeruddin Mohammed

ETL (Abintio) developer at IBM
linkedin profile url

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

Ola Bike Rides Request Demand Forecast
Given big data at taxi service (ride-hailing) i.e. OLA, you will learn multi-step time series forecasting and clustering with Mini-Batch K-means Algorithm on geospatial data to predict future ride requests for a particular region at a given time.

Build ARCH and GARCH Models in Time Series using Python
In this Project we will build an ARCH and a GARCH model using Python

Insurance Pricing Forecast Using XGBoost Regressor
In this project, we are going to talk about insurance forecast by using linear and xgboost regression techniques.

BigMart Sales Prediction ML Project in Python
The goal of the BigMart Sales Prediction ML project is to build and evaluate different predictive models and determine the sales of each product at a store.

Build a Text Generator Model using Amazon SageMaker
In this Deep Learning Project, you will train a Text Generator Model on Amazon Reviews Dataset using LSTM Algorithm in PyTorch and deploy it on Amazon SageMaker.

Build a Multi-Class Classification Model in Python on Saturn Cloud
In this machine learning classification project, you will build a multi-class classification model in Python on Saturn Cloud to predict the license status of a business.

End-to-End Snowflake Healthcare Analytics Project on AWS-2
In this AWS Snowflake project, you will build an end to end retraining pipeline by checking Data and Model Drift and learn how to redeploy the model if needed

Learn to Build an End-to-End Machine Learning Pipeline - Part 1
In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, addressing a major challenge in the logistics industry.

NLP Project on LDA Topic Modelling Python using RACE Dataset
Use the RACE dataset to extract a dominant topic from each document and perform LDA topic modeling in python.

Build a Hybrid Recommender System in Python using LightFM
In this Recommender System project, you will build a hybrid recommender system in Python using LightFM .