How to apply gradient boosting in R for regression

This recipe helps you apply gradient boosting in R for regression

Recipe Objective

How to apply gradient boosting in R for regression?

Classification and regression are supervised learning models that can be solved using algorithms like linear regression / logistics regression, decision tree, etc. But these are not competitive in terms of producing a good prediction accuracy. Ensemble techniques, on the other hand, create multiple models and combine them into one to produce effective results. Bagging, boosting, random forest, are different types of ensemble techniques. Boosting is a sequential ensemble technique in which the model is improved using the information from previously grown weaker models. This process is continued for multiple iterations until a final model is built which will predict a more accurate outcome. There are 3 types of boosting techniques: 1. Adaboost 2. Gradient Descent. 3. Xgboost In Gradient Boosting is a sequential technique, were each new model is built from learning the errors of the previous model i.e each predictor is trained using the residual errors of the predecessor as labels. In this recipe, a dataset where the relation between the cost of bags w.r.t width ,of the bags is to be determined using boosting — gbm technique.

A Gentle Introduction to Ensemble Learning in Machine Learning

Step 1 - Install the necessary libraries

install.packages('gbm') # for fitting the gradient boosting model install.packages('caret') # for general data preparation and model fitting library(gbm) library(caret)

Step 2 - Read a csv file and explore the data

The dataset attached contains the data of 160 different bags associated with ABC industries. The bags have certain attributes which are described below: 1. Height – The height of the bag 2. Width – The width of the bag 3. Length – The length of the bag 4. Weight – The weight the bag can carry 5. Weight1 – Weight the bag can carry after expansion The company now wants to predict the cost they should set for a new variant of these kinds of bags.

data <- read.csv("/content/Data_1.csv") head(data) # head() returns the top 6 rows of the dataframe summary(data) # returns the statistical summary of the data columns dim(data)

Step 3 - Train and Test data

set.seed(0) # set seed for generating random data. # createDataPartition() function from the caret package to split the original dataset into a training and testing set and split data into training (80%) and testing set (20%) parts = createDataPartition(data$Cost, p = .8, list = F) train = data[parts, ] test = data[-parts, ] test_x = test[, -1] # feature and target array test_y = test[, 1]

Step 4 - Create a gbm model

Now, we will fit and train our model using the gbm() function with gaussiann distribution

model_gbm = gbm(train$Cost ~., data = train, distribution = "gaussian", cv.folds = 10, shrinkage = .01, n.minobsinnode = 10, n.trees = 500) print(model_gbm) summary(model_gbm)

Step 5 - Make predictions on the test dataset

We use our model_gbm model to make predictions on the testing data (unseen data) and predict the 'Cost' value and generate performance measures.

pred_y = predict.gbm(model_gbm, test_x) pred_y

Step 6 - Check the accuracy of our model

residuals = test_y - pred_y RMSE = sqrt(mean(residuals^2)) cat('The root mean square error of the test data is ', round(RMSE,3),'\n') y_test_mean = mean(test_y) # Calculate total sum of squares tss = sum((test_y - y_test_mean)^2 ) # Calculate residual sum of squares rss = sum(residuals^2) # Calculate R-squared rsq = 1 - (rss/tss) cat('The R-square of the test data is ', round(rsq,3), '\n') # visualize the model, actual and predicted data x_ax = 1:length(pred_y) plot(x_ax, test_y, col="blue", pch=20, cex=.9) lines(x_ax, pred_y, col="red", pch=20, cex=.9) {"mode":"full","isActive":false}

What Users are saying..

profile image

Savvy Sahai

Data Science Intern, Capgemini
linkedin profile url

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More

Relevant Projects

Learn to Build a Neural network from Scratch using NumPy
In this deep learning project, you will learn to build a neural network from scratch using NumPy

Recommender System Machine Learning Project for Beginners-1
Recommender System Machine Learning Project for Beginners - Learn how to design, implement and train a rule-based recommender system in Python

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

Build a Churn Prediction Model using Ensemble Learning
Learn how to build ensemble machine learning models like Random Forest, Adaboost, and Gradient Boosting for Customer Churn Prediction using Python

Build a Credit Default Risk Prediction Model with LightGBM
In this Machine Learning Project, you will build a classification model for default prediction with LightGBM.

AWS MLOps Project to Deploy a Classification Model [Banking]
In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

End-to-End Speech Emotion Recognition Project using ANN
Speech Emotion Recognition using RAVDESS Audio Dataset - Build an Artificial Neural Network Model to Classify Audio Data into various Emotions like Sad, Happy, Angry, and Neutral

End-to-End Snowflake Healthcare Analytics Project on AWS-1
In this Snowflake Healthcare Analytics Project, you will leverage Snowflake on AWS to predict patient length of stay (LOS) in hospitals. The prediction of LOS can help in efficient resource allocation, lower the risk of staff/visitor infections, and improve overall hospital functioning.

Recommender System Machine Learning Project for Beginners-4
Collaborative Filtering Recommender System Project - Comparison of different model based and memory based methods to build recommendation system using collaborative filtering.

Build a Similar Images Finder with Python, Keras, and Tensorflow
Build your own image similarity application using Python to search and find images of products that are similar to any given product. You will implement the K-Nearest Neighbor algorithm to find products with maximum similarity.