How to compare extratrees classifier and decision tree in ML in python

This recipe helps you compare extratrees classifier and decision tree in ML in python
Last Updated: 23 Jun 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

Decision tree learns from one path while extratree learns from multiple tree. One other major difference between both lies in the fact that, decision tree computes the locally optimal feature/split combination while in extratree classifer, for each feature under consideration, a random value is selected for the split.

So this recipe is a short example on how to compare decision tree and extratree classifier. Let's get started.

Recipe Objective

Step 1 - Import the library

import pandas as pd import numpy as np from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import ExtraTreesClassifier from sklearn.datasets import load_iris

Let's pause and look at these imports. Numpy and Pandas are the usual ones. sklearn.ensemble contains Extra Tree Classifer classification model. sklearn.tree contains DecisionTreeClassifer classification model. Here sklearn.dataset is used to import one classification based model dataset.

Step 2 - Setup the Data

X,y=load_iris(return_X_y=True) print(X) print(y)

Here, we have used load_iris function to import our dataset in two list form (X and y) and therefore kept return_X_y to be True.

Now our dataset is ready

Step 3 - Building the model

Before we do that, let's look at the important parameters that we need to pass.

1) n_estimators
It decides the number of trees in the forest.

2) criterion
The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain.

3) max_features
It decides the number of features to consider when looking for the best split.

Now that we understand, let's create the object

decision_tree_forest = DecisionTreeClassifier(criterion ='entropy', max_features = 2) extra_tree_forest = ExtraTreesClassifier(n_estimators = 5,criterion ='entropy', max_features = 2)

Here, we have build two model, one for Decision Tree and other for Extra Tree
As you can see, we have set n_estimator to be 5 in ExtraTreeClassifer
Criterion is set to be entropy for both
Max features is set here to be 2 for both

Step 4 - Fit the model and find results

decision_tree_forest.fit(X, y) extra_tree_forest.fit(X, y) decision_feature_importance = decision_tree_forest.feature_importances_ extra_feature_importance = extra_tree_forest.feature_importances_ print(decision_feature_importance) print(extra_feature_importance)

Here, we have simply fit used fit function to fit our both model on X and y and created two objects. There after, we are trying to understand the importance of each feature based on two models we built.

Step 5 - Lets look at our dataset now

Once we run the above code snippet, we will see:

Scroll down the ipython file to have a look at the results.

We can clearly see the difference that is arising due to two models we are using. Noise/Turbulence can be much better handled by ExtraTreeClassifier.

What Users are saying..

Ed Godalle

Director Data Analytics at EY / EY Tech

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Hands-On Approach to Master PyTorch Tensors with Examples

In this deep learning project, you will learn how to perform various operations on the building block of PyTorch : Tensors.

View Project Details

Learn to Build a Neural network from Scratch using NumPy

In this deep learning project, you will learn to build a neural network from scratch using NumPy

View Project Details

Time Series Project to Build a Multiple Linear Regression Model

Learn to build a Multiple linear regression model in Python on Time Series Data

View Project Details

Demand prediction of driver availability using multistep time series analysis

In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

View Project Details

Build a Review Classification Model using Gated Recurrent Unit

In this Machine Learning project, you will build a classification model in python to classify the reviews of an app on a scale of 1 to 5 using Gated Recurrent Unit.

View Project Details

A/B Testing Approach for Comparing Performance of ML Models

The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

View Project Details

Ecommerce product reviews - Pairwise ranking and sentiment analysis

This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

View Project Details

How to compare extratrees classifier and decision tree in ML in python

Recipe Objective

Table of Contents

Step 1 - Import the library

Step 2 - Setup the Data

Step 3 - Building the model

Step 4 - Fit the model and find results

Step 5 - Lets look at our dataset now

Ed Godalle

Relevant Projects

You might also like

Relevant Projects