How does extratrees classifier work?

How does extratrees classifier work?

How does extratrees classifier work?

How does extratrees classifier work


Recipe Objective

Extratree classsifer used multiple tree for building classification model. For feature selection, one of the most powerful tool is extra tree classifier. It works best in presence of noisy features.

So this recipe is a short example on how does extratree classifer works. Let's get started.

Step 1 - Import the library

import pandas as pd import numpy as np from sklearn.ensemble import ExtraTreesClassifier from sklearn.datasets import load_iris

Let's pause and look at these imports. Numpy and Pandas are the usual ones. sklearn.ensemble contains Extra Tree Classifer classification model. Here sklearn.dataset is used to import one classification based model dataset.

Step 2 - Setup the Data

X,y=load_iris(return_X_y=True) print(X) print(y)

Here, we have used load_iris function to import our dataset in two list form (X and y) and therefore kept return_X_y to be True.

Now our dataset is ready

Step 3 - Building the model

Before we do that, let's look at the important parameters that we need to pass.

1) n_estimators
It decides the number of trees in the forest.

2) criterion
The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain.

3) max_features
It decides the number of features to consider when looking for the best split.

Now that we understand, let's create the object

extra_tree_forest = ExtraTreesClassifier(n_estimators = 5,criterion ='entropy', max_features = 2)
  • As you can see, we have set n_estimator to be 5
  • Criterion is set to be entropy
  • Max features is set here to be 2

Step 4 - Fit the model and find results, y) feature_importance = extra_tree_forest.feature_importances_ print(feature_importance)

Here we have simply fit used fit function to fit our model on X and y. There after, we are trying to understand the importance of each feature based on model we built.

Step 5 - Lets look at our dataset now

Once we run the above code snippet, we will see:

Scroll down the ipython file to have a look at the results.

Relevant Projects

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Human Activity Recognition Using Smartphones Data Set
In this deep learning project, you will build a classification system where to precisely identify human fitness activities.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.