How to create and optimize a baseline Decision Tree model for MultiClass Classification?
MACHINE LEARNING RECIPES

How to create and optimize a baseline Decision Tree model for MultiClass Classification?

How to create and optimize a baseline Decision Tree model for MultiClass Classification?

This recipe helps you create and optimize a baseline Decision Tree model for MultiClass Classification

0
In [2]:
## How to create and optimize a baseline Decision Tree model for MultiClass Classification
def Snippet_153():
    print()
    print(format('## How to create and optimize a baseline Decision Tree model for MultiClass Classification','*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # load libraries
    from sklearn import decomposition, datasets
    from sklearn import tree
    from sklearn.pipeline import Pipeline
    from sklearn.model_selection import GridSearchCV, cross_val_score
    from sklearn.preprocessing import StandardScaler

    # Load the iris flower data
    dataset = datasets.make_classification(n_samples=1000, n_features=20, n_informative=5,
                n_redundant=2, n_repeated=0, n_classes=10, n_clusters_per_class=2,
                weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0,
                scale=1.0, shuffle=True, random_state=None)
    X = dataset[0]
    y = dataset[1]

    # Create an scaler object
    sc = StandardScaler()

    # Create a pca object
    pca = decomposition.PCA()

    # Create a logistic regression object with an L2 penalty
    dtreeClf = tree.DecisionTreeClassifier()

    # Create a pipeline of three steps. First, standardize the data.
    # Second, tranform the data with PCA.
    # Third, train a Decision Tree Classifier on the data.
    pipe = Pipeline(steps=[('sc', sc),
                           ('pca', pca),
                           ('dtreeClf', dtreeClf)])

    # Create Parameter Space
    # Create a list of a sequence of integers from 1 to 30 (the number of features in X + 1)
    n_components = list(range(1,X.shape[1]+1,1))

    # Create lists of parameter for DecisionTreeRegressor
    criterion = ['gini', 'entropy']
    max_depth = [4,6,8,10]

    # Create a dictionary of all the parameter options 
    # Note has you can access the parameters of steps of a pipeline by using '__’
    parameters = dict(pca__n_components=n_components,
                      dtreeClf__criterion=criterion,
                      dtreeClf__max_depth=max_depth)

    # Conduct Parameter Optmization With Pipeline
    # Create a grid search object
    clf = GridSearchCV(pipe, parameters)

    # Fit the grid search
    clf.fit(X, y)

    # View The Best Parameters
    print('Best Number Of Components:', clf.best_estimator_.get_params()['pca__n_components'])
    print(); print(clf.best_estimator_.get_params()['dtreeClf'])

    # Use Cross Validation To Evaluate Model
    CV_Result = cross_val_score(clf, X, y, cv=3, n_jobs=-1, scoring='accuracy')
    print(); print(CV_Result)
    print(); print(CV_Result.mean())
    print(); print(CV_Result.std())

Snippet_153()
## How to create and optimize a baseline Decision Tree model for MultiClass Classification
Best Number Of Components: 5

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=6,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')

[0.19230769 0.18918919 0.18541033]

0.18896907194779536

0.002820133024186291

Relevant Projects

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Data Science Project-All State Insurance Claims Severity Prediction
Data science project in R to develop automated methods for predicting the cost and severity of insurance claims.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Human Activity Recognition Using Smartphones Data Set
In this deep learning project, you will build a classification system where to precisely identify human fitness activities.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.