How to apply gradient boosting in R for regression

This recipe helps you apply gradient boosting in R for regression
Last Updated: 22 Dec 2022

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN R DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

How to apply gradient boosting in R for regression?

Classification and regression are supervised learning models that can be solved using algorithms like linear regression / logistics regression, decision tree, etc. But these are not competitive in terms of producing a good prediction accuracy. Ensemble techniques, on the other hand, create multiple models and combine them into one to produce effective results. Bagging, boosting, random forest, are different types of ensemble techniques. Boosting is a sequential ensemble technique in which the model is improved using the information from previously grown weaker models. This process is continued for multiple iterations until a final model is built which will predict a more accurate outcome. There are 3 types of boosting techniques: 1. Adaboost 2. Gradient Descent. 3. Xgboost In Gradient Boosting is a sequential technique, were each new model is built from learning the errors of the previous model i.e each predictor is trained using the residual errors of the predecessor as labels. In this recipe, a dataset where the relation between the cost of bags w.r.t width ,of the bags is to be determined using boosting — gbm technique.

A Gentle Introduction to Ensemble Learning in Machine Learning

Recipe Objective

Step 1 - Install the necessary libraries

install.packages('gbm') # for fitting the gradient boosting model install.packages('caret') # for general data preparation and model fitting library(gbm) library(caret)

Step 2 - Read a csv file and explore the data

The dataset attached contains the data of 160 different bags associated with ABC industries. The bags have certain attributes which are described below: 1. Height – The height of the bag 2. Width – The width of the bag 3. Length – The length of the bag 4. Weight – The weight the bag can carry 5. Weight1 – Weight the bag can carry after expansion The company now wants to predict the cost they should set for a new variant of these kinds of bags.

data <- read.csv("/content/Data_1.csv") head(data) # head() returns the top 6 rows of the dataframe summary(data) # returns the statistical summary of the data columns dim(data)

Step 3 - Train and Test data

set.seed(0) # set seed for generating random data. # createDataPartition() function from the caret package to split the original dataset into a training and testing set and split data into training (80%) and testing set (20%) parts = createDataPartition(data$Cost, p = .8, list = F) train = data[parts, ] test = data[-parts, ] test_x = test[, -1] # feature and target array test_y = test[, 1]

Step 4 - Create a gbm model

Now, we will fit and train our model using the gbm() function with gaussiann distribution

model_gbm = gbm(train$Cost ~., data = train, distribution = "gaussian", cv.folds = 10, shrinkage = .01, n.minobsinnode = 10, n.trees = 500) print(model_gbm) summary(model_gbm)

Step 5 - Make predictions on the test dataset

We use our model_gbm model to make predictions on the testing data (unseen data) and predict the 'Cost' value and generate performance measures.

pred_y = predict.gbm(model_gbm, test_x) pred_y

Step 6 - Check the accuracy of our model

residuals = test_y - pred_y RMSE = sqrt(mean(residuals^2)) cat('The root mean square error of the test data is ', round(RMSE,3),'\n') y_test_mean = mean(test_y) # Calculate total sum of squares tss = sum((test_y - y_test_mean)^2 ) # Calculate residual sum of squares rss = sum(residuals^2) # Calculate R-squared rsq = 1 - (rss/tss) cat('The R-square of the test data is ', round(rsq,3), '\n') # visualize the model, actual and predicted data x_ax = 1:length(pred_y) plot(x_ax, test_y, col="blue", pch=20, cex=.9) lines(x_ax, pred_y, col="red", pch=20, cex=.9) {"mode":"full","isActive":false}

What Users are saying..

Savvy Sahai

Data Science Intern, Capgemini

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Learn to Build a Neural network from Scratch using NumPy

In this deep learning project, you will learn to build a neural network from scratch using NumPy

View Project Details

Recommender System Machine Learning Project for Beginners-1

Recommender System Machine Learning Project for Beginners - Learn how to design, implement and train a rule-based recommender system in Python

View Project Details

Demand prediction of driver availability using multistep time series analysis

In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

View Project Details

Build a Churn Prediction Model using Ensemble Learning

Learn how to build ensemble machine learning models like Random Forest, Adaboost, and Gradient Boosting for Customer Churn Prediction using Python

View Project Details

Build a Credit Default Risk Prediction Model with LightGBM

In this Machine Learning Project, you will build a classification model for default prediction with LightGBM.

View Project Details

AWS MLOps Project to Deploy a Classification Model [Banking]

In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

View Project Details

End-to-End Speech Emotion Recognition Project using ANN

Speech Emotion Recognition using RAVDESS Audio Dataset - Build an Artificial Neural Network Model to Classify Audio Data into various Emotions like Sad, Happy, Angry, and Neutral

View Project Details

End-to-End Snowflake Healthcare Analytics Project on AWS-1

In this Snowflake Healthcare Analytics Project, you will leverage Snowflake on AWS to predict patient length of stay (LOS) in hospitals. The prediction of LOS can help in efficient resource allocation, lower the risk of staff/visitor infections, and improve overall hospital functioning.

View Project Details

Recommender System Machine Learning Project for Beginners-4

Collaborative Filtering Recommender System Project - Comparison of different model based and memory based methods to build recommendation system using collaborative filtering.

View Project Details

Build a Similar Images Finder with Python, Keras, and Tensorflow

Build your own image similarity application using Python to search and find images of products that are similar to any given product. You will implement the K-Nearest Neighbor algorithm to find products with maximum similarity.

View Project Details

How to apply gradient boosting in R for regression

Recipe Objective

Table of Contents

Step 1 - Install the necessary libraries

Step 2 - Read a csv file and explore the data

Step 3 - Train and Test data

Step 4 - Create a gbm model

Step 5 - Make predictions on the test dataset

Step 6 - Check the accuracy of our model

Savvy Sahai

Relevant Projects

You might also like

Relevant Projects