DATA VISUALIZATION
DATA CLEANING PYTHON
DATA MUNGING
MACHINE LEARNING RECIPES
PANDAS CHEATSHEET
ALL TAGS
# How to plot Validation Curve in Python?

# How to plot Validation Curve in Python?

This recipe helps you plot Validation Curve in Python

While working on a dataset we train a model and check its accuracy, if we check the accuracy on the data which we have used for training then the accuracy comes out to be very high because the model have already seen the data. So for real testing we have check the accuracy on unseen data for different parameters of model to get a better view.

This data science python source code does the following:

1. Imports Digit dataset and necessary libraries

2. Imports validation curve function for visualization

3. Splits dataset into train and test

4. Plots graphs using matplotlib to analyze the validation of the model

So this is the recipe on how to use validation curve and we will plot the validation curve.

```
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import validation_curve
```

We have imported all the modules that would be needed like numpy, datasets, RandomForestClassifier and validation_curve. We will see the use of each modules step by step further.

We have imported inbuilt iris dataset from the module datasets and stored the data in X and the target in y.
```
digits = datasets.load_iris()
X, y = digits.data, digits.target
```

Here we are using RandomForestClassifier so first we have to define a object for the range of parameters on which we have to use the validation curve. So we have created an object param_range for that.

Now before using Validation curve, let us first see its parameters:

- estimator : In this we have to pass the metric or the model for which we need to optimize the parameters.
- param_name : In this we have to pass the names of parameters on which we have to use the validation curve. li>param_range : In this we have to pass the range of values of parameter on which we have to use the validation curve.
- cv : In this we have to pass a interger value, as it signifies the number of splits that is needed for cross validation. By default is set as five.
- scoring : This signifies the metric of calculating the score.
- n_jobs : This signifies the number of jobs to be run in parallel, -1 signifies to use all processor.

```
param_range = np.arange(1, 250, 2)
train_scores, test_scores = validation_curve(RandomForestClassifier(),
X, y, param_name="n_estimators", param_range=param_range,
cv=4, scoring="accuracy", n_jobs=-1)
```

Now we are calculating the mean and standard deviation of the training and testing scores.
```
train_mean = np.mean(train_scores, axis=1)
train_std = np.std(train_scores, axis=1)
test_mean = np.mean(test_scores, axis=1)
test_std = np.std(test_scores, axis=1)
```

First we are plotting the mean accuracy scores for both the training and the testing set. Then the accuracy band for the training and testing sets. Finally the few lines is of the other setting like size , legend etc for the plot.
```
plt.subplots(1, figsize=(7,7))
plt.plot(param_range, train_mean, label="Training score", color="black")
plt.plot(param_range, test_mean, label="Cross-validation score", color="dimgrey")
plt.fill_between(param_range, train_mean - train_std, train_mean + train_std, color="gray")
plt.fill_between(param_range, test_mean - test_std, test_mean + test_std, color="gainsboro")
plt.title("Validation Curve With Random Forest")
plt.xlabel("Number Of Trees")
plt.ylabel("Accuracy Score")
plt.tight_layout()
plt.legend(loc="best")
plt.show()
```

The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

In this project, we are going to work on Deep Learning using H2O to predict Census income.

In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.