How to select model using Grid Search in Python?
MODEL SELECTION DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to select model using Grid Search in Python?

How to select model using Grid Search in Python?

This recipe helps you select model using Grid Search in Python

Recipe Objective

Many a times while working on a dataset we don"t know which set of Machine Learning model will give us the best result. Passing all sets of models manually through the model and checking the result might be a hectic work and may not be possible to do.

To get the best model we can use Grid Search. Grid Search passes all models that we want one by one and check the result. Finally it gives us the model which gives the best result.

So this recipe is a short example of how we can select model using Grid Search in Python.

Step 1 - Import the library - GridSearchCv

import numpy as np from sklearn import datasets from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline np.random.seed(0)

Here we have imported various modules like datasets, Logistic Regression, Random Forest Classifier and GridSearchCV from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Setup the Data

Here we have used datasets to load the inbuilt iris dataset and we have created objects X and y to store the data and the target value respectively. iris = datasets.load_iris() X = iris.data y = iris.target

Step 3 - Model and its Parameter

Here, we are using pipeline and defining search space from which grid serch will select a model which will give the best result. pipe = Pipeline([("classifier", RandomForestClassifier())]) search_space = [{"classifier": [LogisticRegression()], "classifier__penalty": ["l1", "l2"], "classifier__C": np.logspace(0, 4, 10) }, {"classifier": [RandomForestClassifier()], "classifier__n_estimators": [10, 100, 1000], "classifier__max_features": [1, 2, 3] }]

Step 4 - Using GridSearchCV and Printing Results

Before using GridSearchCV, lets have a look on the important parameters.

  • estimator: In this we have to pass the models or functions on which we want to use GridSearchCV
  • param_grid: Dictionary or list of parameters of models or function in which GridSearchCV have to select the best.
  • Scoring: It is used as a evaluating metric for the model performance to decide the best model, if not especified then it uses estimator score.
  • cv : In this we have to pass a interger value, as it signifies the number of splits that is needed for cross validation. By default is set as five.
  • n_jobs : This signifies the number of jobs to be run in parallel, -1 signifies to use all processor.
Making an object grid_GBC for GridSearchCV and fitting the dataset i.e X and y clf = GridSearchCV(pipe, search_space, cv=5, verbose=0, n_jobs = -1) best_model = clf.fit(X, y) Now we are using print statements to print the results. It will give best model as a result. print(best_model.best_estimator_.get_params()["classifier"]) As an output we get:

LogisticRegression(C=7.742636826811269, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class="warn", n_jobs=None, penalty="l1", random_state=None,
          solver="warn", tol=0.0001, verbose=0, warm_start=False)

Download Materials

Relevant Projects

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Loan Eligibility Prediction in Python using H2O.ai
In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

Human Activity Recognition Using Multiclass Classification in Python
In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.

Expedia Hotel Recommendations Data Science Project
In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

Time Series LSTM forecasting
In this project, we will use time-series forecasting to predict the values of a sensor using multiple dependent variables. A variety of machine learning models are applied in this task of time series forecasting. We will see a comparison between the LSTM, ARIMA and Regression models. Classical forecasting methods like ARIMA are still popular and powerful but they lack the overall generalizability that memory-based models like LSTM offer. Every model has its own advantages and disadvantages and that will be discussed. The main objective of this article is to lead you through building a working LSTM model and it's different variants such as Vanilla, Stacked, Bidirectional, etc. There will be special focus on customized data preparation for LSTM.

Time Series Analysis Project in R on Stock Market forecasting
In this time series project, you will build a model to predict the stock prices and identify the best time series forecasting model that gives reliable and authentic results for decision making.

Classification - Zero to hero - Part 1
Classification is one of the basic things in ML and most of us jump to Neural networks or boosting to predict classes. But more often than not, to make the other person understand how the classification is happening, we need to use basic models like Logistic, decision trees etc. In this project we talk about you can apply various basic techniques, the maths and intuition behind them and how they paved way to bagging and boosting of the world

Time Series Python Project using Greykite and Neural Prophet
In this time series project, you will forecast Walmart sales over time using the powerful, fast, and flexible time series forecasting library Greykite that helps automate time series problems.