Recipe: How to create and optimize a baseline Decision Tree model for MultiClass Classification?
MACHINE LEARNING RECIPES

How to create and optimize a baseline Decision Tree model for MultiClass Classification?

This recipe helps you create and optimize a baseline Decision Tree model for MultiClass Classification
In [2]:
## How to create and optimize a baseline Decision Tree model for MultiClass Classification
def Snippet_153():
    print()
    print(format('## How to create and optimize a baseline Decision Tree model for MultiClass Classification','*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # load libraries
    from sklearn import decomposition, datasets
    from sklearn import tree
    from sklearn.pipeline import Pipeline
    from sklearn.model_selection import GridSearchCV, cross_val_score
    from sklearn.preprocessing import StandardScaler

    # Load the iris flower data
    dataset = datasets.make_classification(n_samples=1000, n_features=20, n_informative=5,
                n_redundant=2, n_repeated=0, n_classes=10, n_clusters_per_class=2,
                weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0,
                scale=1.0, shuffle=True, random_state=None)
    X = dataset[0]
    y = dataset[1]

    # Create an scaler object
    sc = StandardScaler()

    # Create a pca object
    pca = decomposition.PCA()

    # Create a logistic regression object with an L2 penalty
    dtreeClf = tree.DecisionTreeClassifier()

    # Create a pipeline of three steps. First, standardize the data.
    # Second, tranform the data with PCA.
    # Third, train a Decision Tree Classifier on the data.
    pipe = Pipeline(steps=[('sc', sc),
                           ('pca', pca),
                           ('dtreeClf', dtreeClf)])

    # Create Parameter Space
    # Create a list of a sequence of integers from 1 to 30 (the number of features in X + 1)
    n_components = list(range(1,X.shape[1]+1,1))

    # Create lists of parameter for DecisionTreeRegressor
    criterion = ['gini', 'entropy']
    max_depth = [4,6,8,10]

    # Create a dictionary of all the parameter options 
    # Note has you can access the parameters of steps of a pipeline by using '__’
    parameters = dict(pca__n_components=n_components,
                      dtreeClf__criterion=criterion,
                      dtreeClf__max_depth=max_depth)

    # Conduct Parameter Optmization With Pipeline
    # Create a grid search object
    clf = GridSearchCV(pipe, parameters)

    # Fit the grid search
    clf.fit(X, y)

    # View The Best Parameters
    print('Best Number Of Components:', clf.best_estimator_.get_params()['pca__n_components'])
    print(); print(clf.best_estimator_.get_params()['dtreeClf'])

    # Use Cross Validation To Evaluate Model
    CV_Result = cross_val_score(clf, X, y, cv=3, n_jobs=-1, scoring='accuracy')
    print(); print(CV_Result)
    print(); print(CV_Result.mean())
    print(); print(CV_Result.std())

Snippet_153()
## How to create and optimize a baseline Decision Tree model for MultiClass Classification
Best Number Of Components: 5

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=6,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')

[0.19230769 0.18918919 0.18541033]

0.18896907194779536

0.002820133024186291


Stuck at work?
Can't find the recipe you are looking for. Let us know and we will find an expert to create the recipe for you. Click here
Companies using this Recipe