How to compare extratrees classifier and decision tree?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to compare extratrees classifier and decision tree?

How to compare extratrees classifier and decision tree?

This recipe helps you compare extratrees classifier and decision tree

0

Recipe Objective

Decision tree learns from one path while extratree learns from multiple tree. One other major difference between both lies in the fact that, decision tree computes the locally optimal feature/split combination while in extratree classifer, for each feature under consideration, a random value is selected for the split.

So this recipe is a short example on how to compare decision tree and extratree classifier. Let's get started.

Step 1 - Import the library

import pandas as pd import numpy as np from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import ExtraTreesClassifier from sklearn.datasets import load_iris

Let's pause and look at these imports. Numpy and Pandas are the usual ones. sklearn.ensemble contains Extra Tree Classifer classification model. sklearn.tree contains DecisionTreeClassifer classification model. Here sklearn.dataset is used to import one classification based model dataset.

Step 2 - Setup the Data

X,y=load_iris(return_X_y=True) print(X) print(y)

Here, we have used load_iris function to import our dataset in two list form (X and y) and therefore kept return_X_y to be True.

Now our dataset is ready

Step 3 - Building the model

Before we do that, let's look at the important parameters that we need to pass.

1) n_estimators
It decides the number of trees in the forest.

2) criterion
The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain.

3) max_features
It decides the number of features to consider when looking for the best split.

Now that we understand, let's create the object

decision_tree_forest = DecisionTreeClassifier(criterion ='entropy', max_features = 2) extra_tree_forest = ExtraTreesClassifier(n_estimators = 5,criterion ='entropy', max_features = 2)
  • Here, we have build two model, one for Decision Tree and other for Extra Tree
  • As you can see, we have set n_estimator to be 5 in ExtraTreeClassifer
  • Criterion is set to be entropy for both
  • Max features is set here to be 2 for both

Step 4 - Fit the model and find results

decision_tree_forest.fit(X, y) extra_tree_forest.fit(X, y) decision_feature_importance = decision_tree_forest.feature_importances_ extra_feature_importance = extra_tree_forest.feature_importances_ print(decision_feature_importance) print(extra_feature_importance)

Here, we have simply fit used fit function to fit our both model on X and y and created two objects. There after, we are trying to understand the importance of each feature based on two models we built.

Step 5 - Lets look at our dataset now

Once we run the above code snippet, we will see:

Scroll down the ipython file to have a look at the results.
We can clearly see the difference that is arising due to two models we are using. Noise/Turbulence can be much better handled by ExtraTreeClassifier.

Relevant Projects

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.