How to compare sklearn classification algorithms in Python?

This recipe helps you compare sklearn classification algorithms in Python
Last Updated: 20 Dec 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

How you decide which machine learning model to use on a dataset. Randomly applying any model and testing can be a hectic process. So here we will try to apply many models at once and compare each model.

So this is the recipe on how we can compare sklearn classification algorithms in Python.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Recipe Objective

Step 1 - Import the library

import matplotlib.pyplot as plt from sklearn import model_selection from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.naive_bayes import GaussianNB from sklearn.svm import SVC from sklearn.model_selection import train_test_split from sklearn import datasets import matplotlib.pyplot as plt plt.style.use('ggplot')

We have imported all the models on which we want to train the data. Other than that we have imported many other modules which will be required.

Step 2 - Loading the Dataset

We are using inbuilt wine dataset and stored data in X and target in Y. We are also using test_train_split to split the dataset. We have also created an object seed which we have passed in Kfold in the paremeter random_state. seed = 50 dataset = datasets.load_wine() X = dataset.data; y = dataset.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30) kfold = model_selection.KFold(n_splits=10, random_state=seed)

Learn How Different Classification Techniques in Machine Learning Fair Against Each Other

Step 3 - Loading all Models

Here we have created and empty array and then appended it with all the models like LogisticRegression, DecisionTreeClassifier, GaussianNB and many more. models = [] models.append(('LR', LogisticRegression())) models.append(('LDA', LinearDiscriminantAnalysis())) models.append(('KNN', KNeighborsClassifier())) models.append(('CART', DecisionTreeClassifier())) models.append(('NB', GaussianNB())) models.append(('SVM', SVC()))

Step 4 - Evaluating the models

Here we have created two empty array named results and names and an object scoring. Now we have made a for loop which will itterate over all the models, In the loop we have used the function Kfold and cross validation score with the desired parameters. Finally we have used a print statement to print the result for all the models. results = [] names = [] scoring = 'accuracy' for name, model in models: kfold = model_selection.KFold(n_splits=10, random_state=seed) cv_results = model_selection.cross_val_score(model, X_train, y_train, cv=kfold, scoring=scoring) results.append(cv_results) names.append(name) msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std()) print(msg)

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 5 - Ploting BoxPlot

We have also ploted Box Plot to clearly visualize the result. fig = plt.figure(figsize=(10,10)) fig.suptitle('How to compare sklearn classification algorithms') ax = fig.add_subplot(111) plt.boxplot(results) ax.set_xticklabels(names) plt.show() So the output comes as

LR: 0.960256 (0.039806)
LDA: 0.984615 (0.030769)
KNN: 0.711538 (0.123736)
CART: 0.889103 (0.086955)
NB: 0.951282 (0.064499)
SVM: 0.434615 (0.105752)

Download Materials

iPython Notebook

What Users are saying..

Savvy Sahai

Data Science Intern, Capgemini

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More