How to compare sklearn classification algorithms in Python?

How to compare sklearn classification algorithms in Python?

How to compare sklearn classification algorithms in Python?

This recipe helps you compare sklearn classification algorithms in Python


Recipe Objective

How you decide which machine learning model to use on a dataset. Randomly applying any model and testing can be a hectic process. So here we will try to apply many models at once and compare each model.

So this is the recipe on how we can compare sklearn classification algorithms in Python.

Step 1 - Import the library

import matplotlib.pyplot as plt from sklearn import model_selection from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.naive_bayes import GaussianNB from sklearn.svm import SVC from sklearn.model_selection import train_test_split from sklearn import datasets import matplotlib.pyplot as plt'ggplot')

We have imported all the models on which we want to train the data. Other than that we have imported many other modules which will be required.

Step 2 - Loading the Dataset

We are using inbuilt wine dataset and stored data in X and target in Y. We are also using test_train_split to split the dataset. We have also created an object seed which we have passed in Kfold in the paremeter random_state. seed = 50 dataset = datasets.load_wine() X =; y = X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30) kfold = model_selection.KFold(n_splits=10, random_state=seed)

Step 3 - Loading all Models

Here we have created and empty array and then appended it with all the models like LogisticRegression, DecisionTreeClassifier, GaussianNB and many more. models = [] models.append(('LR', LogisticRegression())) models.append(('LDA', LinearDiscriminantAnalysis())) models.append(('KNN', KNeighborsClassifier())) models.append(('CART', DecisionTreeClassifier())) models.append(('NB', GaussianNB())) models.append(('SVM', SVC()))

Step 4 - Evaluating the models

Here we have created two empty array named results and names and an object scoring. Now we have made a for loop which will itterate over all the models, In the loop we have used the function Kfold and cross validation score with the desired parameters. Finally we have used a print statement to print the result for all the models. results = [] names = [] scoring = 'accuracy' for name, model in models: kfold = model_selection.KFold(n_splits=10, random_state=seed) cv_results = model_selection.cross_val_score(model, X_train, y_train, cv=kfold, scoring=scoring) results.append(cv_results) names.append(name) msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std()) print(msg)

Step 5 - Ploting BoxPlot

We have also ploted Box Plot to clearly visualize the result. fig = plt.figure(figsize=(10,10)) fig.suptitle('How to compare sklearn classification algorithms') ax = fig.add_subplot(111) plt.boxplot(results) ax.set_xticklabels(names) So the output comes as

LR: 0.960256 (0.039806)
LDA: 0.984615 (0.030769)
KNN: 0.711538 (0.123736)
CART: 0.889103 (0.086955)
NB: 0.951282 (0.064499)
SVM: 0.434615 (0.105752)

Relevant Projects

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.