Explain accuracy precision recall and f beta score

In this tutorial, we will learn about the performance metrics of a classification model. We will be learning about accuracy, precision, recall and f-beta score.

Explain Accuracy, Precision, Recall, and F-beta Score

A confusion matrix provides a wealth of information. It helps us understand how effectively the classification model is working through calculated metrics like accuracy, precision, recall, and f-beta score. However, one of the most popular questions among aspiring data scientists is when should these measures be used. The answer to this query can be found in this tutorial. Let's take a look at each of these metrics and see how they're used.

Learn to use RNN for Text Classification with Source Code 

1) Accuracy

Formula: (TP + TN) / (TP+TN+PF+FN)

Accuracy is one of the most used performance metrics. It is the ratio of correctly predicted observations to all the observations. However, deeming a model to be the best, solely based on accuracy is incorrect. Accuracy is a relevant measure when the dataset is symmetric and the number of FPs is almost the same as the number of FNs. However, in the case of asymmetric datasets, we need to resort to other performance metrics because we are concerned about the number of wrongly classified positive and negative predictions as well. For example, in the case of Covid-19 classification, what if wrongly classify the person as negative but the person goes on to fall ill and his condition becomes severe. He might even end up spreading the virus. This is precisely why we need to break down the accuracy formula even more.

Let us go through Type I and Type II errors before understanding Precision, Recall, and F-beta score.

Type I error – False Positive i.e. the case where we reject the null hypothesis
Type 2 error – False Negative i.e. the case where we do not reject the null hypothesis

With this in mind let us move on to Precision.

2) Precision

Formula: TP/ (TP+FP) i.e. TP/Total predicted positive

Precision is implicitly defined as the proportion of accurately detected positive cases among all predicted positive cases. It refers to how precisely is your model able to predict the actual positives. It focuses on Type I error. We must use precision as a performance metric when having False Positives is more concerning. For example – email spam detection. In email spam detection, if an email that is not a spam email is incorrectly classified as spam then the user might end up missing critical emails. In this case, it is more important for the model to be precise.

3) Recall

Formula: TP/ (TP+FN) i.e. TP / Total Actual positive

It's the proportion of accurately detected positive cases among all positive instances. With the same reasoning, we know that when a False Negative has a higher cost, the recall will be the performance metric we use to choose our best model. For example – Fraudulent Detection. A bank may face severe consequences if an actual positive (fraudulent transaction) is predicted as a negative (non-fraudulent) transaction. In the same way, predicting an actually Positive (Covid-19) person as negative is very dangerous. In these cases, we must focus on getting a higher recall.

Precision-Recall Trade-off

The values for both precision, as well as recall, lie between 0 and 1. In our scenario, we wish to prevent overlooking true positive cases by classifying passengers as COVID positive and negative. It would be particularly problematic if a person is genuinely positive but our model fails to detect it because there is a substantial risk of the virus spreading if these individuals are allowed to board the flight. So, even if there's a minuscule chance that a person has COVID, we can't risk identifying them as negative. As a result, we plan so that if the output probability is larger than 0.25, we designate them COVID positive. Therefore, recall is higher but precision is reduced.

Let us now consider an opposite scenario where we must designate a person positive only when we are certain that the person is positive. We can achieve this by setting the threshold of the probability higher (eg: 0.85). This means that a person is positive only when its probability is greater than 0.85 and negative otherwise. We can notice a trade-off between recall and precision for most of the classifiers as we change the threshold of the probability. It is sometimes more convenient to integrate precision and recall into a single statistic when comparing multiple models with varied precision-recall values. To calculate performance, we need a statistic that takes both recall and precision into account.

4) F-beta Score

Formula: ((1+beta2) * Precision * Recall) / (beta2 * Precision + Recall)

As previously stated, we require a statistic that considers both recall and precision, and the F-beta score fulfills this requirement. The weighted harmonic mean of precision and recall is known as the F-beta score. Its value lies between 1 and 0, where 1 is the best and 0 is the worst. The weight “beta” is assigned depending upon the case scenario. If precision is more important, beta is reduced to less than one. When beta is greater than one, recall is prioritized. However, if the beta is set as 1, we get something called an F1 score which is the harmonic mean of precision and recall and gives equal weightage to both of them.

Beta = 1 is the default value. The formula becomes –
F1 score = (2 * Precision * Recall) / (Precision + Recall)

To prioritize precision, you can set a smaller beta value such as 0.5. The formula becomes –
F0.5 score = (1.25 * Precision * Recall) / (0.25 * Precision + Recall)

To prioritize recall, you can set a larger beta value such as 2. The formula becomes –
F2 score = (5 * Precision * Recall) / (4 * Precision + Recall)

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Learn How to Build a Logistic Regression Model in PyTorch
In this Machine Learning Project, you will learn how to build a simple logistic regression model in PyTorch for customer churn prediction.

Image Segmentation using Mask R-CNN with Tensorflow
In this Deep Learning Project on Image Segmentation Python, you will learn how to implement the Mask R-CNN model for early fire detection.

Build a Credit Default Risk Prediction Model with LightGBM
In this Machine Learning Project, you will build a classification model for default prediction with LightGBM.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Credit Card Default Prediction using Machine learning techniques
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

OpenCV Project to Master Advanced Computer Vision Concepts
In this OpenCV project, you will learn to implement advanced computer vision concepts and algorithms in OpenCV library using Python.

Build a Text Generator Model using Amazon SageMaker
In this Deep Learning Project, you will train a Text Generator Model on Amazon Reviews Dataset using LSTM Algorithm in PyTorch and deploy it on Amazon SageMaker.

Text Classification with Transformers-RoBERTa and XLNet Model
In this machine learning project, you will learn how to load, fine tune and evaluate various transformer models for text classification tasks.

Avocado Machine Learning Project Python for Price Prediction
In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.