How to do cost complexity pruning in decision tree classifier in ML

This recipe helps you do cost complexity pruning in decision tree classifier in ML

Recipe Objective

How to do cost complexity pruning in decision tree regressor

Pruning is the technique used to reduce the problem of overfitting. In pruning, we cut down the selected parts of the tree such as branches, buds, roots to improve the tree structure and promote healthy growth.

Step 1- Importing Libraries.

import pandas as pd import numpy as np from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt import seaborn as sns from sklearn.metrics import accuracy_score

Step 2- Importing dataset

iris = sns.load_dataset('iris') iris.head()

Step 3- Preparing the dataset.

X=iris.drop(columns='species') y=iris['species'] Xtrain, Xtest, ytrain, ytest= train_test_split(X,y, test_size=0.3, random_state=20)

Step 4- Fitting model to Decision Tree Classifier.

Running the model through Decision Tree Classifier and checking accuracy.

tree= DecisionTreeClassifier(), ytrain) ytrain_pred=tree.predict(Xtrain) ytest_pred=tree.predict(Xtest) print(accuracy_score(ytrain,ytrain_pred),accuracy_score(ytest,ytest_pred))

Step 5- Applying Pruning.

prun = tree.cost_complexity_pruning_path(Xtrain,ytrain) alphas=prun['ccp_alphas'] alphas

Step 6-Pruning the complete dataset.

Applying pruning to the complete dataset and visualizing the whole process.

train_accuracy, test_accuracy=[],[] for j in alphas: tree= DecisionTreeClassifier(ccp_alpha=j),ytrain) ytrain_pred=tree.predict(Xtrain) ytest_pred=tree.predict(Xtest) train_accuracy.append(accuracy_score(ytrain, ytrain_pred)) test_accuracy.append(accuracy_score(ytest, ytest_pred)) sns.set() plt.figure(figsize=(10,6)) sns.lineplot(y=train_accuracy, x=alphas, label='Training Accuracy') sns.lineplot(y=test_accuracy, x=alphas, label='Testing Accuracy')

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

End-to-End ML Model Monitoring using Airflow and Docker
In this MLOps Project, you will learn to build an end to end pipeline to monitor any changes in the predictive power of model or degradation of data.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Build a Graph Based Recommendation System in Python -Part 1
Python Recommender Systems Project - Learn to build a graph based recommendation system in eCommerce to recommend products.

NLP Project on LDA Topic Modelling Python using RACE Dataset
Use the RACE dataset to extract a dominant topic from each document and perform LDA topic modeling in python.

Census Income Data Set Project-Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based oncensus data.

Locality Sensitive Hashing Python Code for Look-Alike Modelling
In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.

Build Time Series Models for Gaussian Processes in Python
Time Series Project - A hands-on approach to Gaussian Processes for Time Series Modelling in Python

MLOps Project for a Mask R-CNN on GCP using uWSGI Flask
MLOps on GCP - Solved end-to-end MLOps Project to deploy a Mask RCNN Model for Image Segmentation as a Web Application using uWSGI Flask, Docker, and TensorFlow.

Llama2 Project for MetaData Generation using FAISS and RAGs
In this LLM Llama2 Project, you will automate metadata generation using Llama2, RAGs, and AWS to reduce manual efforts.

CycleGAN Implementation for Image-To-Image Translation
In this GAN Deep Learning Project, you will learn how to build an image to image translation model in PyTorch with Cycle GAN.