How to compare sklearn classification algorithms in Python?

How to compare sklearn classification algorithms in Python?

How to compare sklearn classification algorithms in Python?

This recipe helps you compare sklearn classification algorithms in Python


Recipe Objective

How you decide which machine learning model to use on a dataset. Randomly applying any model and testing can be a hectic process. So here we will try to apply many models at once and compare each model.

So this is the recipe on how we can compare sklearn classification algorithms in Python.

Step 1 - Import the library

import matplotlib.pyplot as plt from sklearn import model_selection from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.naive_bayes import GaussianNB from sklearn.svm import SVC from sklearn.model_selection import train_test_split from sklearn import datasets import matplotlib.pyplot as plt'ggplot')

We have imported all the models on which we want to train the data. Other than that we have imported many other modules which will be required.

Step 2 - Loading the Dataset

We are using inbuilt wine dataset and stored data in X and target in Y. We are also using test_train_split to split the dataset. We have also created an object seed which we have passed in Kfold in the paremeter random_state. seed = 50 dataset = datasets.load_wine() X =; y = X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30) kfold = model_selection.KFold(n_splits=10, random_state=seed)

Step 3 - Loading all Models

Here we have created and empty array and then appended it with all the models like LogisticRegression, DecisionTreeClassifier, GaussianNB and many more. models = [] models.append(('LR', LogisticRegression())) models.append(('LDA', LinearDiscriminantAnalysis())) models.append(('KNN', KNeighborsClassifier())) models.append(('CART', DecisionTreeClassifier())) models.append(('NB', GaussianNB())) models.append(('SVM', SVC()))

Step 4 - Evaluating the models

Here we have created two empty array named results and names and an object scoring. Now we have made a for loop which will itterate over all the models, In the loop we have used the function Kfold and cross validation score with the desired parameters. Finally we have used a print statement to print the result for all the models. results = [] names = [] scoring = 'accuracy' for name, model in models: kfold = model_selection.KFold(n_splits=10, random_state=seed) cv_results = model_selection.cross_val_score(model, X_train, y_train, cv=kfold, scoring=scoring) results.append(cv_results) names.append(name) msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std()) print(msg)

Step 5 - Ploting BoxPlot

We have also ploted Box Plot to clearly visualize the result. fig = plt.figure(figsize=(10,10)) fig.suptitle('How to compare sklearn classification algorithms') ax = fig.add_subplot(111) plt.boxplot(results) ax.set_xticklabels(names) So the output comes as

LR: 0.960256 (0.039806)
LDA: 0.984615 (0.030769)
KNN: 0.711538 (0.123736)
CART: 0.889103 (0.086955)
NB: 0.951282 (0.064499)
SVM: 0.434615 (0.105752)

Relevant Projects

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.