How to compare sklearn classification algorithms in Python?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to compare sklearn classification algorithms in Python?

How to compare sklearn classification algorithms in Python?

This recipe helps you compare sklearn classification algorithms in Python

1

Recipe Objective

How you decide which machine learning model to use on a dataset. Randomly applying any model and testing can be a hectic process. So here we will try to apply many models at once and compare each model.

So this is the recipe on how we can compare sklearn classification algorithms in Python.

Step 1 - Import the library

import matplotlib.pyplot as plt from sklearn import model_selection from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.naive_bayes import GaussianNB from sklearn.svm import SVC from sklearn.model_selection import train_test_split from sklearn import datasets import matplotlib.pyplot as plt plt.style.use('ggplot')

We have imported all the models on which we want to train the data. Other than that we have imported many other modules which will be required.

Step 2 - Loading the Dataset

We are using inbuilt wine dataset and stored data in X and target in Y. We are also using test_train_split to split the dataset. We have also created an object seed which we have passed in Kfold in the paremeter random_state. seed = 50 dataset = datasets.load_wine() X = dataset.data; y = dataset.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30) kfold = model_selection.KFold(n_splits=10, random_state=seed)

Step 3 - Loading all Models

Here we have created and empty array and then appended it with all the models like LogisticRegression, DecisionTreeClassifier, GaussianNB and many more. models = [] models.append(('LR', LogisticRegression())) models.append(('LDA', LinearDiscriminantAnalysis())) models.append(('KNN', KNeighborsClassifier())) models.append(('CART', DecisionTreeClassifier())) models.append(('NB', GaussianNB())) models.append(('SVM', SVC()))

Step 4 - Evaluating the models

Here we have created two empty array named results and names and an object scoring. Now we have made a for loop which will itterate over all the models, In the loop we have used the function Kfold and cross validation score with the desired parameters. Finally we have used a print statement to print the result for all the models. results = [] names = [] scoring = 'accuracy' for name, model in models: kfold = model_selection.KFold(n_splits=10, random_state=seed) cv_results = model_selection.cross_val_score(model, X_train, y_train, cv=kfold, scoring=scoring) results.append(cv_results) names.append(name) msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std()) print(msg)

Step 5 - Ploting BoxPlot

We have also ploted Box Plot to clearly visualize the result. fig = plt.figure(figsize=(10,10)) fig.suptitle('How to compare sklearn classification algorithms') ax = fig.add_subplot(111) plt.boxplot(results) ax.set_xticklabels(names) plt.show() So the output comes as

LR: 0.960256 (0.039806)
LDA: 0.984615 (0.030769)
KNN: 0.711538 (0.123736)
CART: 0.889103 (0.086955)
NB: 0.951282 (0.064499)
SVM: 0.434615 (0.105752)

Relevant Projects

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Census Income Data Set Project - Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based on census data.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Expedia Hotel Recommendations Data Science Project
In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.