So many a times it happens that we need to find the important features for training the data. We also need to choose this when there are large number of features and it takes much computational cost to train the data. We can get the important features by XGBoost.
So this is the recipe on How we can visualise XGBoost feature importance in Python.
from sklearn import datasets from sklearn import metrics from sklearn.model_selection import train_test_split from xgboost import XGBClassifier, plot_importance import matplotlib.pyplot as plt
We have imported various modules from differnt libraries such as datasets, metrics,test_train_split, XGBClassifier, plot_importance and plt.
We are using the inbuilt breast cancer dataset to train the model and we used train_test_split to split the data into two parts train and test.
dataset = datasets.load_breast_cancer()
X = dataset.data; y = dataset.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
So we have called XGBClassifier and fitted out test data in it and after that we have made two objects one for the original value of y_test and another for predicted values by model.
model = XGBClassifier()
expected_y = y_test
predicted_y = model.predict(X_test)
So finally we are printing the results such as confusion_matrix and classification_report. We are also using bar graph to visualize the importance of the features.
print(); print('XGBClassifier: ')
print(); print(metrics.classification_report(expected_y, predicted_y, target_names=dataset.target_names))
print(); print(metrics.confusion_matrix(expected_y, predicted_y))
Output of this snippet is given below:
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, n_estimators=100, n_jobs=1, nthread=None, objective='binary:logistic', random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None, silent=None, subsample=1, verbosity=1) XGBClassifier: precision recall f1-score support malignant 0.98 0.96 0.97 53 benign 0.98 0.99 0.98 90 micro avg 0.98 0.98 0.98 143 macro avg 0.98 0.98 0.98 143 weighted avg 0.98 0.98 0.98 143 [[51 2] [ 1 89]]