While using a classification problem we need to use various metrics like precision, recall, f1-score, support or others to check how efficient our model is working.
For this we need to compute there scores by classification report and confusion matrix. So in this recipie we will learn how to generate classification report and confusion matrix in Python.
This data science python source code does the following:
1. Imports necessary libraries and dataset from sklearn
2. performs train test split on the dataset
3. Applies DecisionTreeClassifier model for prediction
4. Prepares classification report for the output
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
We have imported datasets to use the inbuilt dataframe , DecisionTreeClassifier, train_test_split, classification_report and confusion_matrix.
Here we have used datasets to load the inbuilt wine dataset and we have created objects X and y to store the data and the target value respectively.
wine = datasets.load_wine()
X = wine.data
y = wine.target
We are creating a list of target names and We are using train_test_split is used to split the data into two parts, one is train which is used to train the model and the other is test which is used to check how our model is working on unseen data. Here we are passing 0.3 as a parameter in the train_test_split which will split the data such that 30% of data will be in test part and rest 70% will be in the train part.
class_names = wine.target_names
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)
Here we are using DecisionTreeClassifier to predict as a classification model and training it on the train data. After that predicting the output of test data.
classifier_tree = DecisionTreeClassifier()
y_predict = classifier_tree.fit(X_train, y_train).predict(X_test)
Let us first have a look on the parameters of Classification Report:
print(classification_report(y_test, y_predict, target_names=class_names))
print(confusion_matrix(y_test, y_predict))
So the output comes as
precision recall f1-score support class_0 0.95 0.95 0.95 19 class_1 0.95 0.95 0.95 21 class_2 0.95 0.95 0.95 19 micro avg 0.95 0.95 0.95 59 macro avg 0.95 0.95 0.95 59 weighted avg 0.95 0.95 0.95 59 [[18 1 0] [ 0 20 1] [ 1 0 18]]