What does model calibration mean in ML in python

This recipe explains what does model calibration mean in ML in python
Last Updated: 03 Aug 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

Generally, for any classification problem, we predict the class value that has the highest probability of being the true class label. However, sometimes, we want to predict the probabilities of a data instance belonging to each class label. This type of problems can easily be handled by calibration curve. It support models with 0 and 1 value only.

So this recipe is a short example on what does caliberation mean. Let's get started.

Access House Price Prediction Project using Machine Learning with Source Code

Recipe Objective

Step 1 - Import the library

from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.datasets import load_breast_cancer from sklearn.tree import DecisionTreeClassifier from sklearn.calibration import calibration_curve import matplotlib.pyplot as plt

Let's pause and look at these imports. We have exported train_test_split which helps in randomly breaking the datset in two parts. Here sklearn.dataset is used to import one classification based model dataset. Also, we have exported calibration_curve to calibrate our model.

Step 2 - Setup the Data

X,y=load_iris(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

Here, we have used load_iris function to import our dataset in two list form (X and y) and therefore kept return_X_y to be True. Further with have broken down the dataset into 2 parts, train and test with ratio 3:4.

Now our dataset is ready.

Step 3 - Building the model

model =DecisionTreeClassifier(criterion ='entropy', max_features = 2)

We have simply built a classification model with =DecisionTreeClassifier with criterion as entropy and max_feature to be 2.

Step 4 - Fit the model and predict for test set

model.fit(X_train, y_train) y_pred= model.predict(X_test)

Here we have simply fit used fit function to fit our model on X_train and y_train. Now, we are predicting the values of X_test using our built model.

Step 5 - Calibrating our model

x, y = calibration_curve(y_test, y_pred, n_bins = 10, normalize = True)

Now we are calibrating our predicted value to the actual value. n_bins refers for number of bins to discretize the [0, 1] interval. Also, we are normalizing y_pred in the [0,1] interval.

Step 5 - Plotting results

plt.plot([0, 1], [0, 1], linestyle = '--', label = 'Ideally Calibrated') plt.plot(y, x, marker = '.', label = 'Decision Tree Classifier') plt.xlabel('Average Predicted Probability in each bin') plt.ylabel('Ratio of positives') plt.legend() plt.show()

Here, first we have plotted the Ideally calibrated curve which will a straight line between 0 and 1. Now, we plot our calibrated curve of this particular model. he x-axis represents the average predicted probability in each bin. The y-axis is the ratio of positives (the proportion of positive predictions).

Step 7 - Lets look at our dataset now

Once we run the above code snippet, we will see:

Scroll down to the ipython file to visualize the results.

Clearly, the model built is highly efficient on any unknown set.

What Users are saying..

Ameeruddin Mohammed

ETL (Abintio) developer at IBM

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

PyTorch Project to Build a GAN Model on MNIST Dataset

In this deep learning project, you will learn how to build a GAN Model on MNIST Dataset for generating new images of handwritten digits.

View Project Details

Hands-On Approach to Regression Discontinuity Design Python

In this machine learning project, you will learn to implement Regression Discontinuity Design Example in Python to determine the effect of age on Mortality Rate in Python.

View Project Details

Build Piecewise and Spline Regression Models in Python

In this Regression Project, you will learn how to build a piecewise and spline regression model from scratch in Python to predict the points scored by a sports team.

View Project Details

Build OCR from Scratch Python using YOLO and Tesseract

In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

View Project Details

NLP Project for Multi Class Text Classification using BERT Model

In this NLP Project, you will learn how to build a multi-class text classification model using using the pre-trained BERT model.

View Project Details

Learn to Build a Neural network from Scratch using NumPy

In this deep learning project, you will learn to build a neural network from scratch using NumPy

View Project Details

NLP Project to Build a Resume Parser in Python using Spacy

Use the popular Spacy NLP python library for OCR and text classification to build a Resume Parser in Python.

View Project Details

Learn Object Tracking (SOT, MOT) using OpenCV and Python

Get Started with Object Tracking using OpenCV and Python - Learn to implement Multiple Instance Learning Tracker (MIL) algorithm, Generic Object Tracking Using Regression Networks Tracker (GOTURN) algorithm, Kernelized Correlation Filters Tracker (KCF) algorithm, Tracking, Learning, Detection Tracker (TLD) algorithm for single and multiple object tracking from various video clips.

View Project Details