How to compare extratrees classifier and decision tree?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

# How to compare extratrees classifier and decision tree?

This recipe helps you compare extratrees classifier and decision tree

## Recipe Objective

Decision tree learns from one path while extratree learns from multiple tree. One other major difference between both lies in the fact that, decision tree computes the locally optimal feature/split combination while in extratree classifer, for each feature under consideration, a random value is selected for the split.

So this recipe is a short example on how to compare decision tree and extratree classifier. Let's get started.

## Step 1 - Import the library

``` import pandas as pd import numpy as np from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import ExtraTreesClassifier from sklearn.datasets import load_iris ```

Let's pause and look at these imports. Numpy and Pandas are the usual ones. sklearn.ensemble contains Extra Tree Classifer classification model. sklearn.tree contains DecisionTreeClassifer classification model. Here sklearn.dataset is used to import one classification based model dataset.

## Step 2 - Setup the Data

``` X,y=load_iris(return_X_y=True) print(X) print(y) ```

Here, we have used load_iris function to import our dataset in two list form (X and y) and therefore kept return_X_y to be True.

## Step 3 - Building the model

Before we do that, let's look at the important parameters that we need to pass.

1) n_estimators
It decides the number of trees in the forest.

2) criterion
The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain.

3) max_features
It decides the number of features to consider when looking for the best split.

Now that we understand, let's create the object

``` decision_tree_forest = DecisionTreeClassifier(criterion ='entropy', max_features = 2) extra_tree_forest = ExtraTreesClassifier(n_estimators = 5,criterion ='entropy', max_features = 2) ```
• Here, we have build two model, one for Decision Tree and other for Extra Tree
• As you can see, we have set n_estimator to be 5 in ExtraTreeClassifer
• Criterion is set to be entropy for both
• Max features is set here to be 2 for both

## Step 4 - Fit the model and find results

``` decision_tree_forest.fit(X, y) extra_tree_forest.fit(X, y) decision_feature_importance = decision_tree_forest.feature_importances_ extra_feature_importance = extra_tree_forest.feature_importances_ print(decision_feature_importance) print(extra_feature_importance) ```

Here, we have simply fit used fit function to fit our both model on X and y and created two objects. There after, we are trying to understand the importance of each feature based on two models we built.

## Step 5 - Lets look at our dataset now

Once we run the above code snippet, we will see:

```Scroll down the ipython file to have a look at the results.
```
```We can clearly see the difference that is arising due to two models we are using. Noise/Turbulence can be much better handled by ExtraTreeClassifier.
```

#### Relevant Projects

##### Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

##### Expedia Hotel Recommendations Data Science Project
In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

##### Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

##### Loan Eligibility Prediction in Python using H2O.ai
In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

##### Avocado Machine Learning Project Python for Price Prediction
In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.

##### Forecasting Business KPI's with Tensorflow and Python
In this machine learning project, you will use the video clip of an IPL match played between CSK and RCB to forecast key performance indicators like the number of appearances of a brand logo, the frames, and the shortest and longest area percentage in the video.

##### Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

##### Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

##### Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

##### Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.