How to classify wine using sklearn ensemble Bagging model in ML in python

This recipe helps you classify wine using sklearn ensemble Bagging model in ML in python

Recipe Objective

Have you ever tried to use Ensemble models like Bagging Classifier, Extra Tree Classifier and Random Forest Classifier for Analysis. In this we will using both for different dataset.

So this recipe is a short example of how we can classify "wine" using sklearn ensemble (Bagging) model - Multiclass Classification.

A Gentle Introduction to Ensemble Learning in Machine Learning

Step 1 - Import the library

from sklearn import datasets from sklearn import metrics from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt plt.style.use("ggplot") from sklearn import ensemble

Here we have imported various modules like datasets, mertics, ensemble and test_train_split from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Setup the Data

Here we have used datasets to load the inbuilt wine dataset and we have created objects X and y to store the data and the target value respectively. dataset = datasets.load_wine() X = dataset.data; y = dataset.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

Step 3 - Model and its Score

Here, we are using Bagging Classifier as a Machine Learning model to fit the data. model = ensemble.BaggingClassifier() model.fit(X_train, y_train) print(model) Now we have predicted the output by passing X_test and also stored real target in expected_y. expected_y = y_test predicted_y = model.predict(X_test) Here we have printed classification report and confusion matrix for the classifier. print(metrics.classification_report(expected_y, predicted_y, target_names=dataset.target_names)) print(metrics.confusion_matrix(expected_y, predicted_y))

Step 4 - Model and its Score

Here, we are using Extra Tree Classifier as a Machine Learning model to fit the data. model = ensemble.ExtraTreesClassifier() model.fit(X_train, y_train) print(model) Now we have predicted the output by passing X_test and also stored real target in expected_y. expected_y = y_test predicted_y = model.predict(X_test) Here we have printed classification report and confusion matrix for the Regressor. print(metrics.classification_report(expected_y, predicted_y, target_names=dataset.target_names)) print(metrics.confusion_matrix(expected_y, predicted_y))

Step 5 - Model and its Score

Here, we are using Random Forest Classifier as a Machine Learning model to fit the data. model = ensemble.RandomForestClassifier() model.fit(X_train, y_train) print(model) Now we have predicted the output by passing X_test and also stored real target in expected_y. expected_y = y_test predicted_y = model.predict(X_test) Here we have printed classification report and confusion matrix for the Regressor. print(metrics.classification_report(expected_y, predicted_y, target_names=dataset.target_names)) print(metrics.confusion_matrix(expected_y, predicted_y)) As an output we get:

BaggingClassifier(base_estimator=None, bootstrap=True,
         bootstrap_features=False, max_features=1.0, max_samples=1.0,
         n_estimators=10, n_jobs=None, oob_score=False, random_state=None,
         verbose=0, warm_start=False)

ensemble.BaggingClassifier(): 

              precision    recall  f1-score   support

     class_0       1.00      0.93      0.96        14
     class_1       0.95      0.95      0.95        21
     class_2       0.91      1.00      0.95        10

   micro avg       0.96      0.96      0.96        45
   macro avg       0.95      0.96      0.96        45
weighted avg       0.96      0.96      0.96        45


[[13  1  0]
 [ 0 20  1]
 [ 0  0 10]]

ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion="gini",
           max_depth=None, max_features="auto", max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

ensemble.ExtraTreesClassifier(): 

              precision    recall  f1-score   support

     class_0       0.93      1.00      0.97        14
     class_1       1.00      0.90      0.95        21
     class_2       0.91      1.00      0.95        10

   micro avg       0.96      0.96      0.96        45
   macro avg       0.95      0.97      0.96        45
weighted avg       0.96      0.96      0.96        45


[[14  0  0]
 [ 1 19  1]
 [ 0  0 10]]

RandomForestClassifier(bootstrap=True, class_weight=None, criterion="gini",
            max_depth=None, max_features="auto", max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)

ensemble.RandomForestClassifier(): 

              precision    recall  f1-score   support

     class_0       1.00      0.93      0.96        14
     class_1       0.95      1.00      0.98        21
     class_2       1.00      1.00      1.00        10

   micro avg       0.98      0.98      0.98        45
   macro avg       0.98      0.98      0.98        45
weighted avg       0.98      0.98      0.98        45


[[13  1  0]
 [ 0 21  0]
 [ 0  0 10]]

Download Materials

What Users are saying..

profile image

Savvy Sahai

Data Science Intern, Capgemini
linkedin profile url

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More

Relevant Projects

Azure Text Analytics for Medical Search Engine Deployment
Microsoft Azure Project - Use Azure text analytics cognitive service to deploy a machine learning model into Azure Databricks

Learn to Build an End-to-End Machine Learning Pipeline - Part 1
In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, addressing a major challenge in the logistics industry.

Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

Build a Customer Churn Prediction Model using Decision Trees
Develop a customer churn prediction model using decision tree machine learning algorithms and data science on streaming service data.

A/B Testing Approach for Comparing Performance of ML Models
The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Build CI/CD Pipeline for Machine Learning Projects using Jenkins
In this project, you will learn how to create a CI/CD pipeline for a search engine application using Jenkins.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Skip Gram Model Python Implementation for Word Embeddings
Skip-Gram Model word2vec Example -Learn how to implement the skip gram algorithm in NLP for word embeddings on a set of documents.