MACHINE LEARNING RECIPES
DATA CLEANING PYTHON
DATA MUNGING
PANDAS CHEATSHEET
ALL TAGS
# How to find optimal parameters using RandomizedSearchCV for Regression?

# How to find optimal parameters using RandomizedSearchCV for Regression?

This recipe helps you find optimal parameters using RandomizedSearchCV for Regression

So while training a model we need to pass few of the hyperparameters that effect the predictions of the model. But how find which set of hyperparameters gives the best result? This can be done by RandomizedSearchCV. RandomizedSearchCV randomly passes the set of hyperparameters and calculate the score and gives the best set of hyperparameters which gives the best score as an output.

This python source code does the following:

1. Imports the necessary libraries

2. Loads the dataset and performs train_test_split

3. Applies GradientBoostingClassifier and evaluates the result

4. Hyperparameter tunes the GBR Classifier model using RandomSearchCV

So this is the recipe on How we can find optimal parameters using RandomizedSearchCV for Regression.

```
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import GradientBoostingRegressor
from scipy.stats import uniform as sp_randFloat
from scipy.stats import randint as sp_randInt
```

We have imported various modules from differnt libraries such as datasets, train_test_split, RandomizedSearchCV, GradientBoostingRegressor, sp_randFloat and sp_randInt.

We are using the inbuilt diabetes dataset to train the model and we used train_test_split to split the data into two parts train and test.
```
dataset = datasets.load_diabetes()
X = dataset.data; y = dataset.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
```

Here we are using GradientBoostingRegressor as a model to train the data and setting its parameters(i.e. learning_rate, subsample, n_estimators and max_depth) for which we have to use RandomizedSearchCV to get the best set of parameters.
```
model = GradientBoostingRegressor()
parameters = {'learning_rate': sp_randFloat(),
'subsample' : sp_randFloat(),
'n_estimators' : sp_randInt(100, 1000),
'max_depth' : sp_randInt(4, 10)
}
```

Before using RandomizedSearchCV first look at its parameters:

- estimator : In this we have to pass the metric or the model for which we need to optimize the parameters.
- param_distributions : In this we have to pass the dictionary of parameters that we need to optimize.
- cv : In this we have to pass a interger value, as it signifies the number of splits that is needed for cross validation. By default is set as five.
- n_iter : This signifies the number of parameter settings that are sampled. By default it is set as 10.
- n_jobs : This signifies the number of jobs to be run in parallel, -1 signifies to use all processor.

```
randm_src = RandomizedSearchCV(estimator=model, param_distributions = parameters,
cv = 2, n_iter = 10, n_jobs=-1)
randm_src.fit(X_train, y_train)
print(" Results from Random Search " )
print("\n The best estimator across ALL searched params:\n", randm_src.best_estimator_)
print("\n The best score across ALL searched params:\n", randm_src.best_score_)
print("\n The best parameters across ALL searched params:\n", randm_src.best_params_)
```

Output of this snippet is given below:
Results from Random Search The best estimator across ALL searched params: GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None, learning_rate=0.17889450760287762, loss='ls', max_depth=7, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=737, n_iter_no_change=None, presort='auto', random_state=None, subsample=0.40247913722860207, tol=0.0001, validation_fraction=0.1, verbose=0, warm_start=False) The best score across ALL searched params: 0.23754616011566576 The best parameters across ALL searched params: {'learning_rate': 0.17889450760287762, 'max_depth': 7, 'n_estimators': 737, 'subsample': 0.40247913722860207}

This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

In this deep learning project, you will build a classification system where to precisely identify human fitness activities.

In this project, we are going to work on Deep Learning using H2O to predict Census income.

In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Datasetâ€‹ using Keras in Python.

In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.