How to parallalise execution of XGBoost and cross validation in Python?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to parallalise execution of XGBoost and cross validation in Python?

How to parallalise execution of XGBoost and cross validation in Python?

This recipe helps you parallalise execution of XGBoost and cross validation in Python

Recipe Objective

Have you ever tried to parallaise function and calculate the computational time or running time of a model?

So this recipe is a short example of how we can parallalise execution of XGBoost and cross validation in Python.

Step 1 - Import the library

import time from sklearn import datasets from sklearn.model_selection import train_test_split, cross_val_score from xgboost import XGBClassifier

Here we have imported various modules like time, datasets, XGBClassifier and test_train_split from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Setup the Data

Here we have used datasets to load the inbuilt wine dataset and we have created objects X and y to store the data and the target value respectively. dataset = datasets.load_wine() X = dataset.data; y = dataset.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

Step 3 - Single Thread XGBoost and Parallel Thread CV

Here, we are using XGBClassifier as a Machine Learning model to fit the data and cross validation score. In the model we have passed nthread equals to 1. We are using time library to compute the time. start = time.time() model = XGBClassifier(nthread=1) results = cross_val_score(model, X, y, cv=10, scoring="neg_log_loss", n_jobs=-1) elapsed = time.time() - start print("Single Thread XGBoost, Parallel Thread CV: %f" % (elapsed))

Step 4 - Thread XGBoost and Single Thread CV

Here, we are using XGBClassifier as a Machine Learning model to fit the data and cross validation score. In the model we have passed nthread equals to -1. We are using time library to compute the time. start = time.time() model = XGBClassifier(nthread=-1) results = cross_val_score(model, X, y, cv=10, scoring="neg_log_loss", n_jobs=1) elapsed = time.time() - start print("Parallel Thread XGBoost, Single Thread CV: %f" % (elapsed))

Step 5 - Thread XGBoost and CV

Here, we are using XGBClassifier as a Machine Learning model to fit the data and cross validation score. In the model we have passed nthread equals to -1. We are using time library to compute the time. start = time.time() model = XGBClassifier(nthread=-1) results = cross_val_score(model, X, y, cv=10, scoring="neg_log_loss", n_jobs=-1) elapsed = time.time() - start print("Parallel Thread XGBoost and CV: %f" % (elapsed))

As an output we get:

Single Thread XGBoost, Parallel Thread CV: 3.380478
Parallel Thread XGBoost, Single Thread CV: 2.431405
Parallel Thread XGBoost and CV: 0.197474

Download Materials

Relevant Projects

Creating your own embeddings using Glove and Word2vec
We all at some point in time wished to create our own language as a child! But what if certain words always cooccur with another in a corpus? Thus you can make your own model which will understand which word goes with which one, which words are often coming together etc. This all can be done by building a custom embeddings model which we create in this project

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Abstractive Text Summarization using Transformers-BART Model
Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

RASA NLU chatbot creation
The project will use rasa NLU for the Intent classifier, spacy for entity tagging, and mongo dB as the DB. The project will incorporate slot filling and context management and will be supporting the following intent and entities. Intents : product_info | ask_price|cancel_order Entities : product_name|location|order id The project will demonstrate how to generate data on the fly, annotate using framework and how to process those for different pieces of training as discussed above .

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.