How to parallalise execution of XGBoost and cross validation in Python?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to parallalise execution of XGBoost and cross validation in Python?

How to parallalise execution of XGBoost and cross validation in Python?

This recipe helps you parallalise execution of XGBoost and cross validation in Python

0

Recipe Objective

Have you ever tried to parallaise function and calculate the computational time or running time of a model?

So this recipe is a short example of how we can parallalise execution of XGBoost and cross validation in Python.

Step 1 - Import the library

import time from sklearn import datasets from sklearn.model_selection import train_test_split, cross_val_score from xgboost import XGBClassifier

Here we have imported various modules like time, datasets, XGBClassifier and test_train_split from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Setup the Data

Here we have used datasets to load the inbuilt wine dataset and we have created objects X and y to store the data and the target value respectively. dataset = datasets.load_wine() X = dataset.data; y = dataset.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

Step 3 - Single Thread XGBoost and Parallel Thread CV

Here, we are using XGBClassifier as a Machine Learning model to fit the data and cross validation score. In the model we have passed nthread equals to 1. We are using time library to compute the time. start = time.time() model = XGBClassifier(nthread=1) results = cross_val_score(model, X, y, cv=10, scoring="neg_log_loss", n_jobs=-1) elapsed = time.time() - start print("Single Thread XGBoost, Parallel Thread CV: %f" % (elapsed))

Step 4 - Thread XGBoost and Single Thread CV

Here, we are using XGBClassifier as a Machine Learning model to fit the data and cross validation score. In the model we have passed nthread equals to -1. We are using time library to compute the time. start = time.time() model = XGBClassifier(nthread=-1) results = cross_val_score(model, X, y, cv=10, scoring="neg_log_loss", n_jobs=1) elapsed = time.time() - start print("Parallel Thread XGBoost, Single Thread CV: %f" % (elapsed))

Step 5 - Thread XGBoost and CV

Here, we are using XGBClassifier as a Machine Learning model to fit the data and cross validation score. In the model we have passed nthread equals to -1. We are using time library to compute the time. start = time.time() model = XGBClassifier(nthread=-1) results = cross_val_score(model, X, y, cv=10, scoring="neg_log_loss", n_jobs=-1) elapsed = time.time() - start print("Parallel Thread XGBoost and CV: %f" % (elapsed))

As an output we get:

Single Thread XGBoost, Parallel Thread CV: 3.380478
Parallel Thread XGBoost, Single Thread CV: 2.431405
Parallel Thread XGBoost and CV: 0.197474

Relevant Projects

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.