How to plot a learning Curve in Python?

How to plot a learning Curve in Python?

How to plot a learning Curve in Python?

This recipe helps you plot a learning Curve in Python


Recipe Objective

While training a dataset sometimes we need to know how model is training with each row of data passed through it. Sometimes while training a very large dataset it takes a lots of time and for that we want to know that after passing speicific percentage of dataset what is the score of the model. So this can be done by learning curve.

This data science python source code does the following:
1. Imports Digit dataset and necessary libraries
2. Imports Learning curve function for visualization
3. Splits dataset into train and test
4. Plots graphs using matplotlib to analyze the learning curve

So this recipe is a short example of how we can plot a learning Curve in Python.

Step 1 - Import the library

import numpy as np import matplotlib.pyplot as plt from sklearn.ensemble import RandomForestClassifier from sklearn import datasets from sklearn.model_selection import learning_curve

Here we have imported various modules like datasets, RandomForestClassifier and learning_curve from differnt libraries. We will understand the use of these later while using it in the in the code snippet.
For now just have a look on these imports.

Step 2 - Setup the Data

Here we have used datasets to load the inbuilt breast cancer dataset and we have created objects X and y to store the data and the target value respectively. cancer = datasets.load_breast_cancer() X, y =,

Step 3 - Learning Curve and Scores

Here, we are using Learning curve to get train_sizes, train_score and test_score. Before using Learning Curve let us have a look on its parameters.

  • estimator: In this we have to pass the models or functions on which we want to use GridSearchCV
  • train_sizes: Relative or absolute numbers of training examples that will be used to generate the learning curve.
  • Scoring: It is used as a evaluating metric for the model performance to decide the best hyperparameters, if not especified then it uses estimator score.
  • cv : In this we have to pass a interger value, as it signifies the number of splits that is needed for cross validation. By default is set as five.
  • n_jobs : This signifies the number of jobs to be run in parallel, -1 signifies to use all processor.
train_sizes, train_scores, test_scores = learning_curve(RandomForestClassifier(), X, y, cv=10, scoring='accuracy', n_jobs=-1, train_sizes=np.linspace(0.01, 1.0, 50)) Now we have calculated the mean and standard deviation of the train and test scores. train_mean = np.mean(train_scores, axis=1) train_std = np.std(train_scores, axis=1) test_mean = np.mean(test_scores, axis=1) test_std = np.std(test_scores, axis=1)

Step 4 - Ploting the Learning Curve

Finally, its time to plot the learning curve. We have used matplotlib to plot lines and band of the learning curve. plt.subplots(1, figsize=(10,10)) plt.plot(train_sizes, train_mean, '--', color="#111111", label="Training score") plt.plot(train_sizes, test_mean, color="#111111", label="Cross-validation score") plt.fill_between(train_sizes, train_mean - train_std, train_mean + train_std, color="#DDDDDD") plt.fill_between(train_sizes, test_mean - test_std, test_mean + test_std, color="#DDDDDD") plt.title("Learning Curve") plt.xlabel("Training Set Size"), plt.ylabel("Accuracy Score"), plt.legend(loc="best") plt.tight_layout() As an output we get:


Relevant Projects

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.