How to visualise XGBoost feature importance in Python?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to visualise XGBoost feature importance in Python?

How to visualise XGBoost feature importance in Python?

This recipe helps you visualise XGBoost feature importance in Python

Recipe Objective

So many a times it happens that we need to find the important features for training the data. We also need to choose this when there are large number of features and it takes much computational cost to train the data. We can get the important features by XGBoost.

So this is the recipe on How we can visualise XGBoost feature importance in Python.

Step 1 - Import the library

from sklearn import datasets from sklearn import metrics from sklearn.model_selection import train_test_split from xgboost import XGBClassifier, plot_importance import matplotlib.pyplot as plt

We have imported various modules from differnt libraries such as datasets, metrics,test_train_split, XGBClassifier, plot_importance and plt.

Step 2 - Setting up the Data

We are using the inbuilt breast cancer dataset to train the model and we used train_test_split to split the data into two parts train and test. dataset = datasets.load_breast_cancer() X = dataset.data; y = dataset.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

Step 3 - Training the Model

So we have called XGBClassifier and fitted out test data in it and after that we have made two objects one for the original value of y_test and another for predicted values by model. model = XGBClassifier() model.fit(X_train, y_train) print(model) expected_y = y_test predicted_y = model.predict(X_test)

Step 4 - Printing the results and ploting the graph

So finally we are printing the results such as confusion_matrix and classification_report. We are also using bar graph to visualize the importance of the features. print(); print('XGBClassifier: ') print(); print(metrics.classification_report(expected_y, predicted_y, target_names=dataset.target_names)) print(); print(metrics.confusion_matrix(expected_y, predicted_y)) plt.bar(range(len(model.feature_importances_)), model.feature_importances_) plt.show() plt.barh(range(len(model.feature_importances_)), model.feature_importances_) plt.show() plot_importance(model) plt.show() Output of this snippet is given below:

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bynode=1, colsample_bytree=1, gamma=0, learning_rate=0.1,
       max_delta_step=0, max_depth=3, min_child_weight=1, missing=None,
       n_estimators=100, n_jobs=1, nthread=None,
       objective='binary:logistic', random_state=0, reg_alpha=0,
       reg_lambda=1, scale_pos_weight=1, seed=None, silent=None,
       subsample=1, verbosity=1)

XGBClassifier: 

              precision    recall  f1-score   support

   malignant       0.98      0.96      0.97        53
      benign       0.98      0.99      0.98        90

   micro avg       0.98      0.98      0.98       143
   macro avg       0.98      0.98      0.98       143
weighted avg       0.98      0.98      0.98       143


[[51  2]
 [ 1 89]]

Download Materials

Relevant Projects

Time Series LSTM forecasting
In this project, we will use time-series forecasting to predict the values of a sensor using multiple dependent variables. A variety of machine learning models are applied in this task of time series forecasting. We will see a comparison between the LSTM, ARIMA and Regression models. Classical forecasting methods like ARIMA are still popular and powerful but they lack the overall generalizability that memory-based models like LSTM offer. Every model has its own advantages and disadvantages and that will be discussed. The main objective of this article is to lead you through building a working LSTM model and it's different variants such as Vanilla, Stacked, Bidirectional, etc. There will be special focus on customized data preparation for LSTM.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Convolutional RCCn's for extracting the text out of images
CRNNs combine both convolutional and recurrent architectures and is widely used in text detection and optical character recognition (OCR). In this project, we are going to use a CRNN architecture to detect text in sample images. The data we are going to use is TRSynth100k from Kaggle. Given an image containing some text, the goal here is to correctly identify the text using the CRNN architecture. We are going to train the model end-to-end from scratch.

Machine learning for Retail Price Recommendation with Python
Use the Mercari Dataset with dynamic pricing to build a price recommendation algorithm using machine learning in Python to automatically suggest the right product prices.

Human Activity Recognition Using Multiclass Classification in Python
In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Classification of T shirt images to see if they have text on them
Want to search images of clothes which have text on them? Then this project talks through how we can classify an image whether it has text on it or not. For this we use state of the model called as inception and try and deepdive into how it works on our dataset

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.