How to compare different classification models using logloss and how to pick the best one
LOG loss is useful when we have to compare models, It compares the model mainly in two ways by their outputs and their probabilistic outcome.
* To calculate LOG loss the classifier assigns the probability to each class.
* LOG loss starts to measures the uncertainity of the model of every sample and it compares with the true labels and in return penalises the false classification.
* LOG loss has the ability to get defined for two or more labels
* LOG loss nearer to 0 means higher accuracy away from zero means lower accuracy. LOG loss has the range between 0 to infinity.
If there are N samples belonging to M classes :
1.) yij , indicates whether sample i belongs to class j or not
2.) pij , indicates the probability of sample i belonging to class j
The negative sign negates log(yij^) output which is always negative. yij^ outputs a probability (0 - 1). log(x) is nagative if 0 < x < 1.
from sklearn.model_selection import train_test_split, cross_val_score, cross_val_predict import pandas as pd import numpy as np import seaborn as sns from sklearn.linear_model import LogisticRegression
We will import the dataset directly through seaborn library.
iris = sns.load_dataset('iris') X=iris.drop(columns='species') y=iris['species'] Xtrain, Xtest, ytrain, ytest= train_test_split(X,y, test_size=0.3, random_state=20)
We will start the fit the Machine Learning Model.
# Logistic Regression clf_logreg = LogisticRegression() # fit model clf_logreg.fit(Xtrain, ytrain)
we will calculate the LOG LOSS score.
logloss_logreg = cross_val_score(clf_logreg, Xtrain, ytrain, scoring = 'neg_log_loss').mean() print(logloss_logreg)