How to evaluate XGBoost model with learning curves example 1 in python

This recipe helps you evaluate XGBoost model with learning curves example 1 in python

Recipe Objective

 

While training a dataset sometimes we need to know how model is training with each row of data passed through it. Sometimes while training a very large dataset it takes a lots of time and for that we want to know that after passing speicific percentage of dataset what is the score of the model. So this can be done by learning curve. So here we are evaluating XGBoost with learning curves.

So this recipe is a short example of how we can evaluate XGBoost model with learning curves.

Step 1 - Import the library

import numpy as np from xgboost import XGBClassifier import matplotlib.pyplot as plt plt.style.use('ggplot') from sklearn import datasets import matplotlib.pyplot as plt from sklearn.model_selection import learning_curve

Here we have imported various modules like datasets, XGBClassifier and learning_curve from differnt libraries. We will understand the use of these later while using it in the in the code snippet.
For now just have a look on these imports.

Step 2 - Setup the Data

Here we have used datasets to load the inbuilt wine dataset and we have created objects X and y to store the data and the target value respectively. dataset = datasets.load_wine() X = dataset.data; y = dataset.target

Step 3 - Learning Curve and Scores

Here, we are using Learning curve to get train_sizes, train_score and test_score. Before using Learning Curve let us have a look on its parameters.

  • estimator: In this we have to pass the models or functions on which we want to use Learning
  • train_sizes: Relative or absolute numbers of training examples that will be used to generate the learning curve.
  • Scoring: It is used as a evaluating metric for the model performance to decide the best hyperparameters, if not especified then it uses estimator score.
  • cv : In this we have to pass a interger value, as it signifies the number of splits that is needed for cross validation. By default is set as five.
  • n_jobs : This signifies the number of jobs to be run in parallel, -1 signifies to use all processor.

train_sizes, train_scores, test_scores = learning_curve(XGBClassifier(), X, y, cv=10, scoring='accuracy', n_jobs=-1, train_sizes=np.linspace(0.01, 1.0, 50)) Now we have calculated the mean and standard deviation of the train and test scores. train_mean = np.mean(train_scores, axis=1) train_std = np.std(train_scores, axis=1) test_mean = np.mean(test_scores, axis=1) test_std = np.std(test_scores, axis=1)

Step 4 - Ploting the Learning Curve

Finally, its time to plot the learning curve. We have used matplotlib to plot lines and band of the learning curve. plt.subplots(1, figsize=(7,7)) plt.plot(train_sizes, train_mean, '--', color="#111111", label="Training score") plt.plot(train_sizes, test_mean, color="#111111", label="Cross-validation score") plt.fill_between(train_sizes, train_mean - train_std, train_mean + train_std, color="#DDDDDD") plt.fill_between(train_sizes, test_mean - test_std, test_mean + test_std, color="#DDDDDD") plt.title("Learning Curve") plt.xlabel("Training Set Size"), plt.ylabel("Accuracy Score"), plt.legend(loc="best") plt.tight_layout(); plt.show() The output can be seen below in the code execution

Download Materials

What Users are saying..

profile image

Jingwei Li

Graduate Research assistance at Stony Brook University
linkedin profile url

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Census Income Data Set Project-Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based oncensus data.

Time Series Forecasting Project-Building ARIMA Model in Python
Build a time series ARIMA model in Python to forecast the use of arrival rate density to support staffing decisions at call centres.

OpenCV Project for Beginners to Learn Computer Vision Basics
In this OpenCV project, you will learn computer vision basics and the fundamentals of OpenCV library using Python.

Mastering A/B Testing: A Practical Guide for Production
In this A/B Testing for Machine Learning Project, you will gain hands-on experience in conducting A/B tests, analyzing statistical significance, and understanding the challenges of building a solution for A/B testing in a production environment.

Deploy Transformer BART Model for Text summarization on GCP
Learn to Deploy a Machine Learning Model for the Abstractive Text Summarization on Google Cloud Platform (GCP)

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Learn to Build a Siamese Neural Network for Image Similarity
In this Deep Learning Project, you will learn how to build a siamese neural network with Keras and Tensorflow for Image Similarity.

Stock Price Prediction Project using LSTM and RNN
Learn how to predict stock prices using RNN and LSTM models. Understand deep learning concepts and apply them to real-world financial data for accurate forecasting.

Hands-On Approach to Regression Discontinuity Design Python
In this machine learning project, you will learn to implement Regression Discontinuity Design Example in Python to determine the effect of age on Mortality Rate in Python.

Build Regression (Linear,Ridge,Lasso) Models in NumPy Python
In this machine learning regression project, you will learn to build NumPy Regression Models (Linear Regression, Ridge Regression, Lasso Regression) from Scratch.