How to parallalise execution of XGBoost and cross validation in Python?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to parallalise execution of XGBoost and cross validation in Python?

How to parallalise execution of XGBoost and cross validation in Python?

This recipe helps you parallalise execution of XGBoost and cross validation in Python

0

Recipe Objective

Have you ever tried to parallaise function and calculate the computational time or running time of a model?

So this recipe is a short example of how we can parallalise execution of XGBoost and cross validation in Python.

Step 1 - Import the library

import time from sklearn import datasets from sklearn.model_selection import train_test_split, cross_val_score from xgboost import XGBClassifier

Here we have imported various modules like time, datasets, XGBClassifier and test_train_split from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Setup the Data

Here we have used datasets to load the inbuilt wine dataset and we have created objects X and y to store the data and the target value respectively. dataset = datasets.load_wine() X = dataset.data; y = dataset.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

Step 3 - Single Thread XGBoost and Parallel Thread CV

Here, we are using XGBClassifier as a Machine Learning model to fit the data and cross validation score. In the model we have passed nthread equals to 1. We are using time library to compute the time. start = time.time() model = XGBClassifier(nthread=1) results = cross_val_score(model, X, y, cv=10, scoring="neg_log_loss", n_jobs=-1) elapsed = time.time() - start print("Single Thread XGBoost, Parallel Thread CV: %f" % (elapsed))

Step 4 - Thread XGBoost and Single Thread CV

Here, we are using XGBClassifier as a Machine Learning model to fit the data and cross validation score. In the model we have passed nthread equals to -1. We are using time library to compute the time. start = time.time() model = XGBClassifier(nthread=-1) results = cross_val_score(model, X, y, cv=10, scoring="neg_log_loss", n_jobs=1) elapsed = time.time() - start print("Parallel Thread XGBoost, Single Thread CV: %f" % (elapsed))

Step 5 - Thread XGBoost and CV

Here, we are using XGBClassifier as a Machine Learning model to fit the data and cross validation score. In the model we have passed nthread equals to -1. We are using time library to compute the time. start = time.time() model = XGBClassifier(nthread=-1) results = cross_val_score(model, X, y, cv=10, scoring="neg_log_loss", n_jobs=-1) elapsed = time.time() - start print("Parallel Thread XGBoost and CV: %f" % (elapsed))

As an output we get:

Single Thread XGBoost, Parallel Thread CV: 3.380478
Parallel Thread XGBoost, Single Thread CV: 2.431405
Parallel Thread XGBoost and CV: 0.197474

Relevant Projects

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.