How to create and optimize a baseline Decision Tree model for Binary Classification?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to create and optimize a baseline Decision Tree model for Binary Classification?

How to create and optimize a baseline Decision Tree model for Binary Classification?

This recipe helps you create and optimize a baseline Decision Tree model for Binary Classification

0
This data science python source code does the following: 1. Imports all the necessary library 2. Creates your own dataset 3. Creates pipeline for the workflow 4. Applies "Standard Scaler" and "PCA" decomposition 5. Applies decision tree classifier model and optimizes it using GridSearchCV
In [2]:
## How to create and optimize a baseline Decision Tree model for Binary Classification
def Snippet_152():
    print()
    print(format('## How to create and optimize a baseline Decision Tree model for Binary Classification','*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # load libraries
    from sklearn import decomposition, datasets
    from sklearn import tree
    from sklearn.pipeline import Pipeline
    from sklearn.model_selection import GridSearchCV, cross_val_score
    from sklearn.preprocessing import StandardScaler

    # Load the iris flower data
    dataset = datasets.make_classification(n_samples=100, n_features=20, n_informative=5,
                n_redundant=2, n_repeated=0, n_classes=2, n_clusters_per_class=2,
                weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0,
                scale=1.0, shuffle=True, random_state=None)
    X = dataset[0]
    y = dataset[1]
    print(y)

    # Create an scaler object
    sc = StandardScaler()

    # Create a pca object
    pca = decomposition.PCA()

    # Create a logistic regression object with an L2 penalty
    dtreeClf = tree.DecisionTreeClassifier()

    # Create a pipeline of three steps. First, standardize the data.
    # Second, tranform the data with PCA.
    # Third, train a Decision Tree Classifier on the data.
    pipe = Pipeline(steps=[('sc', sc),
                           ('pca', pca),
                           ('dtreeClf', dtreeClf)])

    # Create Parameter Space
    # Create a list of a sequence of integers from 1 to 30 (the number of features in X + 1)
    n_components = list(range(1,X.shape[1]+1,1))

    # Create lists of parameter for DecisionTreeRegressor
    criterion = ['gini', 'entropy']
    max_depth = [4,6,8,10]

    # Create a dictionary of all the parameter options 
    # Note has you can access the parameters of steps of a pipeline by using '__’
    parameters = dict(pca__n_components=n_components,
                      dtreeClf__criterion=criterion,
                      dtreeClf__max_depth=max_depth)

    # Conduct Parameter Optmization With Pipeline
    # Create a grid search object
    clf = GridSearchCV(pipe, parameters)

    # Fit the grid search
    clf.fit(X, y)

    # View The Best Parameters
    print('Best Number Of Components:', clf.best_estimator_.get_params()['pca__n_components'])
    print(); print(clf.best_estimator_.get_params()['dtreeClf'])

    # Use Cross Validation To Evaluate Model
    CV_Result = cross_val_score(clf, X, y, cv=3, n_jobs=-1, scoring='accuracy')
    print(); print(CV_Result)
    print(); print(CV_Result.mean())
    print(); print(CV_Result.std())

Snippet_152()
## How to create and optimize a baseline Decision Tree model for Binary Classification
[0 1 1 1 0 1 0 1 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 0
 1 1 1 0 0 1 1 1 1 0 1 1 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 1 0 1 0 1 1 0 1 1
 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 0 1 1 0 1 0 0 0 1 1 0]
Best Number Of Components: 5

DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=8,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')

[0.61764706 0.64705882 0.65625   ]

0.6403186274509803

0.016464496130569113

Relevant Projects

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Human Activity Recognition Using Smartphones Data Set
In this deep learning project, you will build a classification system where to precisely identify human fitness activities.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.