What does model calibration mean?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

What does model calibration mean?

What does model calibration mean?

This recipe explains what does model calibration mean

Recipe Objective

Generally, for any classification problem, we predict the class value that has the highest probability of being the true class label. However, sometimes, we want to predict the probabilities of a data instance belonging to each class label. This type of problems can easily be handled by calibration curve. It support models with 0 and 1 value only.

So this recipe is a short example on what does caliberation mean. Let's get started.

Step 1 - Import the library

from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.datasets import load_breast_cancer from sklearn.tree import DecisionTreeClassifier from sklearn.calibration import calibration_curve import matplotlib.pyplot as plt

Let's pause and look at these imports. We have exported train_test_split which helps in randomly breaking the datset in two parts. Here sklearn.dataset is used to import one classification based model dataset. Also, we have exported calibration_curve to calibrate our model.

Step 2 - Setup the Data

X,y=load_iris(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

Here, we have used load_iris function to import our dataset in two list form (X and y) and therefore kept return_X_y to be True. Further with have broken down the dataset into 2 parts, train and test with ratio 3:4.

Now our dataset is ready.

Step 3 - Building the model

model =DecisionTreeClassifier(criterion ='entropy', max_features = 2)

We have simply built a classification model with =DecisionTreeClassifier with criterion as entropy and max_feature to be 2.

Step 4 - Fit the model and predict for test set

model.fit(X_train, y_train) y_pred= model.predict(X_test)

Here we have simply fit used fit function to fit our model on X_train and y_train. Now, we are predicting the values of X_test using our built model.

Step 5 - Calibrating our model

x, y = calibration_curve(y_test, y_pred, n_bins = 10, normalize = True)

Now we are calibrating our predicted value to the actual value. n_bins refers for number of bins to discretize the [0, 1] interval. Also, we are normalizing y_pred in the [0,1] interval.

Step 5 - Plotting results

plt.plot([0, 1], [0, 1], linestyle = '--', label = 'Ideally Calibrated') plt.plot(y, x, marker = '.', label = 'Decision Tree Classifier') plt.xlabel('Average Predicted Probability in each bin') plt.ylabel('Ratio of positives') plt.legend() plt.show()

Here, first we have plotted the Ideally calibrated curve which will a straight line between 0 and 1. Now, we plot our calibrated curve of this particular model. he x-axis represents the average predicted probability in each bin. The y-axis is the ratio of positives (the proportion of positive predictions).

Step 7 - Lets look at our dataset now

Once we run the above code snippet, we will see:

Scroll down to the ipython file to visualize the results.

Clearly, the model built is highly efficient on any unknown set.

Relevant Projects

Build OCR from Scratch Python using YOLO and Tesseract
In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

Abstractive Text Summarization using Transformers-BART Model
Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.

Machine learning for Retail Price Recommendation with Python
Use the Mercari Dataset with dynamic pricing to build a price recommendation algorithm using machine learning in Python to automatically suggest the right product prices.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Build a Similar Images Finder with Python, Keras, and Tensorflow
Build your own image similarity application using Python to search and find images of products that are similar to any given product. You will implement the K-Nearest Neighbor algorithm to find products with maximum similarity.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Time Series Python Project using Greykite and Neural Prophet
In this time series project, you will forecast Walmart sales over time using the powerful, fast, and flexible time series forecasting library Greykite that helps automate time series problems.