How to find optimal parameters using RandomizedSearchCV?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to find optimal parameters using RandomizedSearchCV?

How to find optimal parameters using RandomizedSearchCV?

This recipe helps you find optimal parameters using RandomizedSearchCV

1

Recipe Objective

So while training a model we need to pass few of the hyperparameters that effect the predictions of the model. But how find which set of hyperparameters gives the best result? This can be done by RandomizedSearchCV. RandomizedSearchCV randomly passes the set of hyperparameters and calculate the score and gives the best set of hyperparameters which gives the best score as an output.

So this is the recipe on How we can find parameters using RandomizedSearchCV.

Step 1 - Import the library

from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.model_selection import RandomizedSearchCV from sklearn.ensemble import GradientBoostingClassifier from scipy.stats import uniform as sp_randFloat from scipy.stats import randint as sp_randInt

We have imported various modules from differnt libraries such as datasets, train_test_split, RandomizedSearchCV, GradientBoostingClassifier, sp_randFloat and sp_randInt.

Step 2 - Setting up the Data

We are using the inbuilt cancer dataset to train the model and we used train_test_split to split the data into two parts train and test. dataset = datasets.load_breast_cancer() X = dataset.data; y = dataset.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)

Step 3 - Model and its parameters

Here we are using GradientBoostingClassifier as a model to train the data and setting its parameters(i.e. learning_rate, subsample, n_estimators and max_depth) for which we have to use RandomizedSearchCV to get the best set of parameters. model = GradientBoostingClassifier() parameters = {"learning_rate": sp_randFloat(), "subsample" : sp_randFloat(), "n_estimators" : sp_randInt(100, 1000), "max_depth" : sp_randInt(4, 10) }

Step 4 - Using RandomizedSearchCV and Printing the results

Before using RandomizedSearchCV first look at its parameters:

  • estimator : In this we have to pass the metric or the model for which we need to optimize the parameters.
  • param_distributions : In this we have to pass the dictionary of parameters that we need to optimize.
  • cv : In this we have to pass a interger value, as it signifies the number of splits that is needed for cross validation. By default is set as five.
  • n_iter : This signifies the number of parameter settings that are sampled. By default it is set as 10.
  • n_jobs : This signifies the number of jobs to be run in parallel, -1 signifies to use all processor.
So we have defined an object to use RandomizedSearchCV with the important parameters. Then we have fitted the train data in it and finally with the print statements we can print the optimized values of hyperparameters. randm = RandomizedSearchCV(estimator=model, param_distributions = parameters, cv = 2, n_iter = 10, n_jobs=-1) randm.fit(X_train, y_train) print(" Results from Random Search " ) print(" The best estimator across ALL searched params: ", randm.best_estimator_) print(" The best score across ALL searched params: ", randm.best_score_) print(" The best parameters across ALL searched params: ", randm.best_params_) Output of this snippet is given below:

Results from Random Search 

 The best estimator across ALL searched params:
 GradientBoostingClassifier(criterion="friedman_mse", init=None,
              learning_rate=0.5475829777592278, loss="deviance",
              max_depth=4, max_features=None, max_leaf_nodes=None,
              min_impurity_decrease=0.0, min_impurity_split=None,
              min_samples_leaf=1, min_samples_split=2,
              min_weight_fraction_leaf=0.0, n_estimators=201,
              n_iter_no_change=None, presort="auto", random_state=None,
              subsample=0.9940317823281449, tol=0.0001,
              validation_fraction=0.1, verbose=0, warm_start=False)

 The best score across ALL searched params:
 0.9447236180904522

 The best parameters across ALL searched params:
 {"learning_rate": 0.5475829777592278, "max_depth": 4, "n_estimators": 201, "subsample": 0.9940317823281449}

Relevant Projects

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.