How to visualise XGBoost feature importance in Python?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to visualise XGBoost feature importance in Python?

How to visualise XGBoost feature importance in Python?

This recipe helps you visualise XGBoost feature importance in Python

0

Recipe Objective

So many a times it happens that we need to find the important features for training the data. We also need to choose this when there are large number of features and it takes much computational cost to train the data. We can get the important features by XGBoost.

So this is the recipe on How we can visualise XGBoost feature importance in Python.

Step 1 - Import the library

from sklearn import datasets from sklearn import metrics from sklearn.model_selection import train_test_split from xgboost import XGBClassifier, plot_importance import matplotlib.pyplot as plt

We have imported various modules from differnt libraries such as datasets, metrics,test_train_split, XGBClassifier, plot_importance and plt.

Step 2 - Setting up the Data

We are using the inbuilt breast cancer dataset to train the model and we used train_test_split to split the data into two parts train and test. dataset = datasets.load_breast_cancer() X = dataset.data; y = dataset.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

Step 3 - Training the Model

So we have called XGBClassifier and fitted out test data in it and after that we have made two objects one for the original value of y_test and another for predicted values by model. model = XGBClassifier() model.fit(X_train, y_train) print(model) expected_y = y_test predicted_y = model.predict(X_test)

Step 4 - Printing the results and ploting the graph

So finally we are printing the results such as confusion_matrix and classification_report. We are also using bar graph to visualize the importance of the features. print(); print('XGBClassifier: ') print(); print(metrics.classification_report(expected_y, predicted_y, target_names=dataset.target_names)) print(); print(metrics.confusion_matrix(expected_y, predicted_y)) plt.bar(range(len(model.feature_importances_)), model.feature_importances_) plt.show() plt.barh(range(len(model.feature_importances_)), model.feature_importances_) plt.show() plot_importance(model) plt.show() Output of this snippet is given below:

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bynode=1, colsample_bytree=1, gamma=0, learning_rate=0.1,
       max_delta_step=0, max_depth=3, min_child_weight=1, missing=None,
       n_estimators=100, n_jobs=1, nthread=None,
       objective='binary:logistic', random_state=0, reg_alpha=0,
       reg_lambda=1, scale_pos_weight=1, seed=None, silent=None,
       subsample=1, verbosity=1)

XGBClassifier: 

              precision    recall  f1-score   support

   malignant       0.98      0.96      0.97        53
      benign       0.98      0.99      0.98        90

   micro avg       0.98      0.98      0.98       143
   macro avg       0.98      0.98      0.98       143
weighted avg       0.98      0.98      0.98       143


[[51  2]
 [ 1 89]]

Relevant Projects

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.