How to apply gradient boosting for classification in R

This recipe helps you apply gradient boosting for classification in R

Recipe Objective

How to apply gradient boosting for classification in R

Classification and regression are supervised learning models that can be solved using algorithms like linear regression / logistics regression, decision tree, etc. But these are not competitive in terms of producing a good prediction accuracy. Ensemble techniques, on the other hand, create multiple models and combine them into one to produce effective results. Bagging, boosting, random forest, are different types of ensemble techniques. Boosting is a sequential ensemble technique in which the model is improved using the information from previously grown weaker models. This process is continued for multiple iterations until a final model is built which will predict a more accurate outcome. There are 3 types of boosting techniques: 1. Adaboost 2. Gradient Descent. 3. Xgboost In Gradient Boosting is a sequential technique, were each new model is built from learning the errors of the previous model i.e each predictor is trained using the residual errors of the predecessor as labels. The following recipe explains how to apply gradient boosting for classification in R

List of Classification Algorithms in Machine Learning

Step 1 - Install the necessary libraries

install.packages('gbm') # for fitting the gradient boosting model install.packages('caret') # for general data preparation and model fitting library(gbm) library(caret)

Step 2 - Read a csv file and explore the data

data <- iris # reads the dataset head(data) # head() returns the top 6 rows of the dataframe summary(data) # returns the statistical summary of the data columns dim(data)

Step 3 - Train and Test data

# createDataPartition() function from the caret package to split the original dataset into a training and testing set and split data into training (80%) and testing set (20%) parts = createDataPartition(data$Species, p = 0.7, list = F) train = data[parts, ] test = data[-parts, ]

Step 4 - Create a xgboost model

# train a model using our training data model_gbm = gbm(Species ~., data = train, distribution = "multinomial", cv.folds = 10, shrinkage = .01, n.minobsinnode = 10, n.trees = 500) # 500 tress to be built summary(model_gbm)

Step 5 - Make predictions on the test dataset

#use model to make predictions on test data pred_test = predict.gbm(object = model_gbm, newdata = test, n.trees = 500, # 500 tress to be built type = "response") pred_test

Step 6 - Give class names

# Give class names to the highest prediction value. class_names = colnames(pred_test)[apply(pred_test, 1, which.max)] result = data.frame(test$Species, class_names) print(result)

Step 7 - Create a confusion matrix

conf_mat = confusionMatrix(test$Species, as.factor(class_names)) print(conf_mat)

The prediction : Setosa : predicted all 15 correctly versicolor : predicted 13 correctly, falsely predicted 2 as virginica virginica : predicted all 14 correctly, falsely predicted 1 as versicolor. The model gives a good accuracy of 93.3%.

{"mode":"full","isActive":false}

What Users are saying..

profile image

Ed Godalle

Director Data Analytics at EY / EY Tech
linkedin profile url

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Learn to Build an End-to-End Machine Learning Pipeline - Part 2
In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, incorporating Hopsworks' feature store and Weights and Biases for model experimentation.

Build an End-to-End AWS SageMaker Classification Model
MLOps on AWS SageMaker -Learn to Build an End-to-End Classification Model on SageMaker to predict a patient’s cause of death.

Build a Review Classification Model using Gated Recurrent Unit
In this Machine Learning project, you will build a classification model in python to classify the reviews of an app on a scale of 1 to 5 using Gated Recurrent Unit.

Insurance Pricing Forecast Using XGBoost Regressor
In this project, we are going to talk about insurance forecast by using linear and xgboost regression techniques.

MLOps Project for a Mask R-CNN on GCP using uWSGI Flask
MLOps on GCP - Solved end-to-end MLOps Project to deploy a Mask RCNN Model for Image Segmentation as a Web Application using uWSGI Flask, Docker, and TensorFlow.

Avocado Machine Learning Project Python for Price Prediction
In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.

Build a Speech-Text Transcriptor with Nvidia Quartznet Model
In this Deep Learning Project, you will leverage transfer learning from Nvidia QuartzNet pre-trained models to develop a speech-to-text transcriptor.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

Build a Credit Default Risk Prediction Model with LightGBM
In this Machine Learning Project, you will build a classification model for default prediction with LightGBM.

Classification Projects on Machine Learning for Beginners - 1
Classification ML Project for Beginners - A Hands-On Approach to Implementing Different Types of Classification Algorithms in Machine Learning for Predictive Modelling