How to plot Validation Curve in Python?
DATA VISUALIZATION DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to plot Validation Curve in Python?

How to plot Validation Curve in Python?

This recipe helps you plot Validation Curve in Python

0

Recipe Objective

While working on a dataset we train a model and check its accuracy, if we check the accuracy on the data which we have used for training then the accuracy comes out to be very high because the model have already seen the data. So for real testing we have check the accuracy on unseen data for different parameters of model to get a better view.

This data science python source code does the following:
1. Imports Digit dataset and necessary libraries
2. Imports validation curve function for visualization
3. Splits dataset into train and test
4. Plots graphs using matplotlib to analyze the validation of the model

So this is the recipe on how to use validation curve and we will plot the validation curve.

Step 1 - Import the library

import matplotlib.pyplot as plt import numpy as np from sklearn import datasets from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import validation_curve

We have imported all the modules that would be needed like numpy, datasets, RandomForestClassifier and validation_curve. We will see the use of each modules step by step further.

Step 2 - Setting up the Data

We have imported inbuilt iris dataset from the module datasets and stored the data in X and the target in y. digits = datasets.load_iris() X, y = digits.data, digits.target

Step 3 - Using Validation_Curve and calculating the scores

Here we are using RandomForestClassifier so first we have to define a object for the range of parameters on which we have to use the validation curve. So we have created an object param_range for that.

Now before using Validation curve, let us first see its parameters:

  • estimator : In this we have to pass the metric or the model for which we need to optimize the parameters.
  • param_name : In this we have to pass the names of parameters on which we have to use the validation curve.
  • li>param_range : In this we have to pass the range of values of parameter on which we have to use the validation curve.
  • cv : In this we have to pass a interger value, as it signifies the number of splits that is needed for cross validation. By default is set as five.
  • scoring : This signifies the metric of calculating the score.
  • n_jobs : This signifies the number of jobs to be run in parallel, -1 signifies to use all processor.

param_range = np.arange(1, 250, 2) train_scores, test_scores = validation_curve(RandomForestClassifier(), X, y, param_name="n_estimators", param_range=param_range, cv=4, scoring="accuracy", n_jobs=-1)

Now we are calculating the mean and standard deviation of the training and testing scores. train_mean = np.mean(train_scores, axis=1) train_std = np.std(train_scores, axis=1) test_mean = np.mean(test_scores, axis=1) test_std = np.std(test_scores, axis=1)

Step 4 - Ploting the validation curve

First we are plotting the mean accuracy scores for both the training and the testing set. Then the accuracy band for the training and testing sets. Finally the few lines is of the other setting like size , legend etc for the plot. plt.subplots(1, figsize=(7,7)) plt.plot(param_range, train_mean, label="Training score", color="black") plt.plot(param_range, test_mean, label="Cross-validation score", color="dimgrey") plt.fill_between(param_range, train_mean - train_std, train_mean + train_std, color="gray") plt.fill_between(param_range, test_mean - test_std, test_mean + test_std, color="gainsboro") plt.title("Validation Curve With Random Forest") plt.xlabel("Number Of Trees") plt.ylabel("Accuracy Score") plt.tight_layout() plt.legend(loc="best") plt.show()

Relevant Projects

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Human Activity Recognition Using Multiclass Classification in Python
In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.