How to do cost complexity pruning in decision tree classifier in ML

This recipe helps you do cost complexity pruning in decision tree classifier in ML
Last Updated: 14 Jun 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

How to do cost complexity pruning in decision tree regressor

Pruning is the technique used to reduce the problem of overfitting. In pruning, we cut down the selected parts of the tree such as branches, buds, roots to improve the tree structure and promote healthy growth.

Recipe Objective

Step 1- Importing Libraries.

import pandas as pd import numpy as np from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt import seaborn as sns from sklearn.metrics import accuracy_score

Step 2- Importing dataset

iris = sns.load_dataset('iris') iris.head()

Step 3- Preparing the dataset.

X=iris.drop(columns='species') y=iris['species'] Xtrain, Xtest, ytrain, ytest= train_test_split(X,y, test_size=0.3, random_state=20)

Step 4- Fitting model to Decision Tree Classifier.

Running the model through Decision Tree Classifier and checking accuracy.

tree= DecisionTreeClassifier() tree.fit(Xtrain, ytrain) ytrain_pred=tree.predict(Xtrain) ytest_pred=tree.predict(Xtest) print(accuracy_score(ytrain,ytrain_pred),accuracy_score(ytest,ytest_pred))

Step 5- Applying Pruning.

prun = tree.cost_complexity_pruning_path(Xtrain,ytrain) alphas=prun['ccp_alphas'] alphas

Step 6-Pruning the complete dataset.

Applying pruning to the complete dataset and visualizing the whole process.

train_accuracy, test_accuracy=[],[] for j in alphas: tree= DecisionTreeClassifier(ccp_alpha=j) tree.fit(Xtrain,ytrain) ytrain_pred=tree.predict(Xtrain) ytest_pred=tree.predict(Xtest) train_accuracy.append(accuracy_score(ytrain, ytrain_pred)) test_accuracy.append(accuracy_score(ytest, ytest_pred)) sns.set() plt.figure(figsize=(10,6)) sns.lineplot(y=train_accuracy, x=alphas, label='Training Accuracy') sns.lineplot(y=test_accuracy, x=alphas, label='Testing Accuracy') plt.show()

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More