How to apply xgboost for classification in R

This recipe helps you apply xgboost for classification in R
Last Updated: 20 Dec 2022

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN R DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

Classification and regression are supervised learning models that can be solved using algorithms like linear regression / logistics regression, decision tree, etc. But these are not competitive in terms of producing a good prediction accuracy of the model. Ensemble techniques, on the other hand, create multiple models and combine them into one to produce effective results. Bagging, boosting, random forest are different types of ensemble techniques. Boosting is a sequential ensemble technique in which the model is improved using the information from previously grown weaker models. This process is continued for multiple iterations until a final model is built which will predict a more accurate outcome. There are 3 types of boosting techniques: 1. Adaboost 2. Gradient Descent. 3. Xgboost Xgboost (extreme gradient boosting) is an advanced version of the gradient descent boosting technique, which is used for increasing the speed and efficiency of computation of the algorithm. The following recipe explains the xgboost for classification in R using the iris dataset.

List of Classification Algorithms in Machine Learning

Recipe Objective

Step 1 - Install the necessary libraries

install.packages('xgboost') # for fitting the xgboost model install.packages('caret') # for general data preparation and model fitting install.packages('e1071') library(xgboost) library(caret) library(e1071)

Step 2 - Read a dataset and explore the data

data <- iris # reads the dataset head(data) # head() returns the top 6 rows of the dataframe summary(data) # returns the statistical summary of the data columns dim(data)

Step 3 - Train and Test data

# createDataPartition() function from the caret package to split the original dataset into a training and testing set and split data into training (80%) and testing set (20%) parts = createDataPartition(data$Species, p = 0.7, list = F) train = data[parts, ] test = data[-parts, ] X_train = data.matrix(train[,-5]) # independent variables for train y_train = train[,5] # dependent variables for train X_test = data.matrix(test[,-5]) # independent variables for test y_test = test[,5] # dependent variables for test # convert the train and test data into xgboost matrix type. xgboost_train = xgb.DMatrix(data=X_train, label=y_train) xgboost_test = xgb.DMatrix(data=X_test, label=y_test)

Step 4 - Create a xgboost model

# train a model using our training data model <- xgboost(data = xgboost_train, # the data max.depth=3, , # max depth nrounds=50) # max number of boosting iterations summary(model)

Step 5 - Make predictions on the test dataset

#use model to make predictions on test data pred_test = predict(model, xgboost_test) pred_test

Step 6 - Convert prediction to factor type

pred_test[(pred_test>3)] = 3 pred_y = as.factor((levels(y_test))[round(pred_test)]) print(pred_y)

Step 7 - Create a confusion matrix

conf_mat = confusionMatrix(y_test, pred_y) print(conf_mat)

The prediction : Setosa : predicted all 15 correctly versicolor : predicted 15 correctly, falsely predicted 2 as virginica virginica : predicted all 13 correctly The model gives a good accuracy of 95.5%.

{"mode":"full","isActive":false}

What Users are saying..

Abhinav Agarwal

Graduate Student at Northwestern University

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

AWS Project to Build and Deploy LSTM Model with Sagemaker

In this AWS Sagemaker Project, you will learn to build a LSTM model on Sagemaker for sales forecasting while analyzing the impact of weather conditions on Sales.

View Project Details

Abstractive Text Summarization using Transformers-BART Model

Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.

View Project Details

Loan Eligibility Prediction using Gradient Boosting Classifier

This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

View Project Details

Word2Vec and FastText Word Embedding with Gensim in Python

In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

View Project Details

NLP Project on LDA Topic Modelling Python using RACE Dataset

Use the RACE dataset to extract a dominant topic from each document and perform LDA topic modeling in python.

View Project Details

Loan Eligibility Prediction Project using Machine learning on GCP

Loan Eligibility Prediction Project - Use SQL and Python to build a predictive model on GCP to determine whether an application requesting loan is eligible or not.

View Project Details

Ola Bike Rides Request Demand Forecast

Given big data at taxi service (ride-hailing) i.e. OLA, you will learn multi-step time series forecasting and clustering with Mini-Batch K-means Algorithm on geospatial data to predict future ride requests for a particular region at a given time.

View Project Details

Build a Multi Touch Attribution Machine Learning Model in Python

Identifying the ROI on marketing campaigns is an essential KPI for any business. In this ML project, you will learn to build a Multi Touch Attribution Model in Python to identify the ROI of various marketing efforts and their impact on conversions or sales..

View Project Details