How to find optimal parameters using RandomizedSearchCV in ML in python

This recipe helps you find optimal parameters using RandomizedSearchCV in ML in python
Last Updated: 26 Dec 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

So while training a model we need to pass few of the hyperparameters that effect the predictions of the model. But how find which set of hyperparameters gives the best result? This can be done by RandomizedSearchCV. RandomizedSearchCV randomly passes the set of hyperparameters and calculate the score and gives the best set of hyperparameters which gives the best score as an output.

So this is the recipe on How we can find parameters using RandomizedSearchCV.

Recipe Objective

Step 1 - Import the library

from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.model_selection import RandomizedSearchCV from sklearn.ensemble import GradientBoostingClassifier from scipy.stats import uniform as sp_randFloat from scipy.stats import randint as sp_randInt

We have imported various modules from differnt libraries such as datasets, train_test_split, RandomizedSearchCV, GradientBoostingClassifier, sp_randFloat and sp_randInt.

Step 2 - Setting up the Data

We are using the inbuilt cancer dataset to train the model and we used train_test_split to split the data into two parts train and test. dataset = datasets.load_breast_cancer() X = dataset.data; y = dataset.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)

Step 3 - Model and its parameters

Here we are using GradientBoostingClassifier as a model to train the data and setting its parameters(i.e. learning_rate, subsample, n_estimators and max_depth) for which we have to use RandomizedSearchCV to get the best set of parameters. model = GradientBoostingClassifier() parameters = {"learning_rate": sp_randFloat(), "subsample" : sp_randFloat(), "n_estimators" : sp_randInt(100, 1000), "max_depth" : sp_randInt(4, 10) }

Step 4 - Using RandomizedSearchCV and Printing the results

Before using RandomizedSearchCV first look at its parameters:

estimator : In this we have to pass the metric or the model for which we need to optimize the parameters.
param_distributions : In this we have to pass the dictionary of parameters that we need to optimize.
cv : In this we have to pass a interger value, as it signifies the number of splits that is needed for cross validation. By default is set as five.
n_iter : This signifies the number of parameter settings that are sampled. By default it is set as 10.
n_jobs : This signifies the number of jobs to be run in parallel, -1 signifies to use all processor.

So we have defined an object to use RandomizedSearchCV with the important parameters. Then we have fitted the train data in it and finally with the print statements we can print the optimized values of hyperparameters. randm = RandomizedSearchCV(estimator=model, param_distributions = parameters, cv = 2, n_iter = 10, n_jobs=-1) randm.fit(X_train, y_train) print(" Results from Random Search " ) print(" The best estimator across ALL searched params: ", randm.best_estimator_) print(" The best score across ALL searched params: ", randm.best_score_) print(" The best parameters across ALL searched params: ", randm.best_params_) Output of this snippet is given below:

Results from Random Search 

 The best estimator across ALL searched params:
 GradientBoostingClassifier(criterion="friedman_mse", init=None,
              learning_rate=0.5475829777592278, loss="deviance",
              max_depth=4, max_features=None, max_leaf_nodes=None,
              min_impurity_decrease=0.0, min_impurity_split=None,
              min_samples_leaf=1, min_samples_split=2,
              min_weight_fraction_leaf=0.0, n_estimators=201,
              n_iter_no_change=None, presort="auto", random_state=None,
              subsample=0.9940317823281449, tol=0.0001,
              validation_fraction=0.1, verbose=0, warm_start=False)

 The best score across ALL searched params:
 0.9447236180904522

 The best parameters across ALL searched params:
 {"learning_rate": 0.5475829777592278, "max_depth": 4, "n_estimators": 201, "subsample": 0.9940317823281449}

Download Materials

iPython Notebook

What Users are saying..

Ed Godalle

Director Data Analytics at EY / EY Tech

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

A/B Testing Approach for Comparing Performance of ML Models

The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

View Project Details

Customer Churn Prediction Analysis using Ensemble Techniques

In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

View Project Details

Machine Learning project for Retail Price Optimization

In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

View Project Details

Build Time Series Models for Gaussian Processes in Python

Time Series Project - A hands-on approach to Gaussian Processes for Time Series Modelling in Python

View Project Details

Build Multi Class Text Classification Models with RNN and LSTM

In this Deep Learning Project, you will use the customer complaints data about consumer financial products to build multi-class text classification models using RNN and LSTM.

View Project Details

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms

In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

View Project Details

Recommender System Machine Learning Project for Beginners-3

Content Based Recommender System Project - Building a Content-Based Product Recommender App with Streamlit

View Project Details

Learn How to Build a Logistic Regression Model in PyTorch

In this Machine Learning Project, you will learn how to build a simple logistic regression model in PyTorch for customer churn prediction.

View Project Details

Recommender System Machine Learning Project for Beginners-4

Collaborative Filtering Recommender System Project - Comparison of different model based and memory based methods to build recommendation system using collaborative filtering.

View Project Details

Many-to-One LSTM for Sentiment Analysis and Text Generation

In this LSTM Project , you will build develop a sentiment detection model using many-to-one LSTMs for accurate prediction of sentiment labels in airline text reviews. Additionally, we will also train many-to-one LSTMs on 'Alice's Adventures in Wonderland' to generate contextually relevant text.

View Project Details

How to find optimal parameters using RandomizedSearchCV in ML in python

Recipe Objective

Table of Contents

Step 1 - Import the library

Step 2 - Setting up the Data

Step 3 - Model and its parameters

Step 4 - Using RandomizedSearchCV and Printing the results

Ed Godalle

Relevant Projects

You might also like

Relevant Projects