How to plot a learning Curve in Python?
DATA VISUALIZATION DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to plot a learning Curve in Python?

How to plot a learning Curve in Python?

This recipe helps you plot a learning Curve in Python

Recipe Objective

While training a dataset sometimes we need to know how model is training with each row of data passed through it. Sometimes while training a very large dataset it takes a lots of time and for that we want to know that after passing speicific percentage of dataset what is the score of the model. So this can be done by learning curve.

This data science python source code does the following:
1. Imports Digit dataset and necessary libraries
2. Imports Learning curve function for visualization
3. Splits dataset into train and test
4. Plots graphs using matplotlib to analyze the learning curve

So this recipe is a short example of how we can plot a learning Curve in Python.

Step 1 - Import the library

import numpy as np import matplotlib.pyplot as plt from sklearn.ensemble import RandomForestClassifier from sklearn import datasets from sklearn.model_selection import learning_curve

Here we have imported various modules like datasets, RandomForestClassifier and learning_curve from differnt libraries. We will understand the use of these later while using it in the in the code snippet.
For now just have a look on these imports.

Step 2 - Setup the Data

Here we have used datasets to load the inbuilt breast cancer dataset and we have created objects X and y to store the data and the target value respectively. cancer = datasets.load_breast_cancer() X, y = cancer.data, cancer.target

Step 3 - Learning Curve and Scores

Here, we are using Learning curve to get train_sizes, train_score and test_score. Before using Learning Curve let us have a look on its parameters.

  • estimator: In this we have to pass the models or functions on which we want to use GridSearchCV
  • train_sizes: Relative or absolute numbers of training examples that will be used to generate the learning curve.
  • Scoring: It is used as a evaluating metric for the model performance to decide the best hyperparameters, if not especified then it uses estimator score.
  • cv : In this we have to pass a interger value, as it signifies the number of splits that is needed for cross validation. By default is set as five.
  • n_jobs : This signifies the number of jobs to be run in parallel, -1 signifies to use all processor.
train_sizes, train_scores, test_scores = learning_curve(RandomForestClassifier(), X, y, cv=10, scoring='accuracy', n_jobs=-1, train_sizes=np.linspace(0.01, 1.0, 50)) Now we have calculated the mean and standard deviation of the train and test scores. train_mean = np.mean(train_scores, axis=1) train_std = np.std(train_scores, axis=1) test_mean = np.mean(test_scores, axis=1) test_std = np.std(test_scores, axis=1)

Step 4 - Ploting the Learning Curve

Finally, its time to plot the learning curve. We have used matplotlib to plot lines and band of the learning curve. plt.subplots(1, figsize=(10,10)) plt.plot(train_sizes, train_mean, '--', color="#111111", label="Training score") plt.plot(train_sizes, test_mean, color="#111111", label="Cross-validation score") plt.fill_between(train_sizes, train_mean - train_std, train_mean + train_std, color="#DDDDDD") plt.fill_between(train_sizes, test_mean - test_std, test_mean + test_std, color="#DDDDDD") plt.title("Learning Curve") plt.xlabel("Training Set Size"), plt.ylabel("Accuracy Score"), plt.legend(loc="best") plt.tight_layout() plt.show() As an output we get:

							
														

Download Materials

Relevant Projects

House Price Prediction Project using Machine Learning
Use the Zillow dataset to follow a test-driven approach and build a regression machine learning model to predict the price of the house based on other variables.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

Churn Prediction in Telecom using Machine Learning in R
Estimating churners before they discontinue using a product or service is extremely important. In this ML project, you will develop a churn prediction model in telecom to predict customers who are most likely subject to churn.

Digit Recognition using CNN for MNIST Dataset in Python
In this deep learning project, you will build a convolutional neural network using MNIST dataset for handwritten digit recognition.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Build a Similar Images Finder with Python, Keras, and Tensorflow
Build your own image similarity application using Python to search and find images of products that are similar to any given product. You will implement the K-Nearest Neighbor algorithm to find products with maximum similarity.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.