How to apply gradient boosting for classification in R

This recipe helps you apply gradient boosting for classification in R
Last Updated: 23 Dec 2022

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN R DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

Classification and regression are supervised learning models that can be solved using algorithms like linear regression / logistics regression, decision tree, etc. But these are not competitive in terms of producing a good prediction accuracy. Ensemble techniques, on the other hand, create multiple models and combine them into one to produce effective results. Bagging, boosting, random forest, are different types of ensemble techniques. Boosting is a sequential ensemble technique in which the model is improved using the information from previously grown weaker models. This process is continued for multiple iterations until a final model is built which will predict a more accurate outcome. There are 3 types of boosting techniques: 1. Adaboost 2. Gradient Descent. 3. Xgboost In Gradient Boosting is a sequential technique, were each new model is built from learning the errors of the previous model i.e each predictor is trained using the residual errors of the predecessor as labels. The following recipe explains how to apply gradient boosting for classification in R

List of Classification Algorithms in Machine Learning

Recipe Objective

Step 1 - Install the necessary libraries

install.packages('gbm') # for fitting the gradient boosting model install.packages('caret') # for general data preparation and model fitting library(gbm) library(caret)

Step 2 - Read a csv file and explore the data

data <- iris # reads the dataset head(data) # head() returns the top 6 rows of the dataframe summary(data) # returns the statistical summary of the data columns dim(data)

Step 3 - Train and Test data

# createDataPartition() function from the caret package to split the original dataset into a training and testing set and split data into training (80%) and testing set (20%) parts = createDataPartition(data$Species, p = 0.7, list = F) train = data[parts, ] test = data[-parts, ]

Step 4 - Create a xgboost model

# train a model using our training data model_gbm = gbm(Species ~., data = train, distribution = "multinomial", cv.folds = 10, shrinkage = .01, n.minobsinnode = 10, n.trees = 500) # 500 tress to be built summary(model_gbm)

Step 5 - Make predictions on the test dataset

#use model to make predictions on test data pred_test = predict.gbm(object = model_gbm, newdata = test, n.trees = 500, # 500 tress to be built type = "response") pred_test

Step 6 - Give class names

# Give class names to the highest prediction value. class_names = colnames(pred_test)[apply(pred_test, 1, which.max)] result = data.frame(test$Species, class_names) print(result)

Step 7 - Create a confusion matrix

conf_mat = confusionMatrix(test$Species, as.factor(class_names)) print(conf_mat)

The prediction : Setosa : predicted all 15 correctly versicolor : predicted 13 correctly, falsely predicted 2 as virginica virginica : predicted all 14 correctly, falsely predicted 1 as versicolor. The model gives a good accuracy of 93.3%.

{"mode":"full","isActive":false}

What Users are saying..

Ed Godalle

Director Data Analytics at EY / EY Tech

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

How to apply gradient boosting for classification in R

Recipe Objective

Table of Contents

Step 1 - Install the necessary libraries

Step 2 - Read a csv file and explore the data

Step 3 - Train and Test data

Step 4 - Create a xgboost model

Step 5 - Make predictions on the test dataset

Step 6 - Give class names

Step 7 - Create a confusion matrix

Ed Godalle

Relevant Projects

You might also like

Relevant Projects