How to apply xgboost for classification in R

This recipe helps you apply xgboost for classification in R

Recipe Objective

How to apply xgboost for classification in R

Classification and regression are supervised learning models that can be solved using algorithms like linear regression / logistics regression, decision tree, etc. But these are not competitive in terms of producing a good prediction accuracy of the model. Ensemble techniques, on the other hand, create multiple models and combine them into one to produce effective results. Bagging, boosting, random forest are different types of ensemble techniques. Boosting is a sequential ensemble technique in which the model is improved using the information from previously grown weaker models. This process is continued for multiple iterations until a final model is built which will predict a more accurate outcome. There are 3 types of boosting techniques: 1. Adaboost 2. Gradient Descent. 3. Xgboost Xgboost (extreme gradient boosting) is an advanced version of the gradient descent boosting technique, which is used for increasing the speed and efficiency of computation of the algorithm. The following recipe explains the xgboost for classification in R using the iris dataset.

List of Classification Algorithms in Machine Learning

Step 1 - Install the necessary libraries

install.packages('xgboost') # for fitting the xgboost model install.packages('caret') # for general data preparation and model fitting install.packages('e1071') library(xgboost) library(caret) library(e1071)

Step 2 - Read a dataset and explore the data

data <- iris # reads the dataset head(data) # head() returns the top 6 rows of the dataframe summary(data) # returns the statistical summary of the data columns dim(data)

Step 3 - Train and Test data

# createDataPartition() function from the caret package to split the original dataset into a training and testing set and split data into training (80%) and testing set (20%) parts = createDataPartition(data$Species, p = 0.7, list = F) train = data[parts, ] test = data[-parts, ] X_train = data.matrix(train[,-5]) # independent variables for train y_train = train[,5] # dependent variables for train X_test = data.matrix(test[,-5]) # independent variables for test y_test = test[,5] # dependent variables for test # convert the train and test data into xgboost matrix type. xgboost_train = xgb.DMatrix(data=X_train, label=y_train) xgboost_test = xgb.DMatrix(data=X_test, label=y_test)

Step 4 - Create a xgboost model

# train a model using our training data model <- xgboost(data = xgboost_train, # the data max.depth=3, , # max depth nrounds=50) # max number of boosting iterations summary(model)

Step 5 - Make predictions on the test dataset

#use model to make predictions on test data pred_test = predict(model, xgboost_test) pred_test

Step 6 - Convert prediction to factor type

pred_test[(pred_test>3)] = 3 pred_y = as.factor((levels(y_test))[round(pred_test)]) print(pred_y)

Step 7 - Create a confusion matrix

conf_mat = confusionMatrix(y_test, pred_y) print(conf_mat)

The prediction : Setosa : predicted all 15 correctly versicolor : predicted 15 correctly, falsely predicted 2 as virginica virginica : predicted all 13 correctly The model gives a good accuracy of 95.5%.

{"mode":"full","isActive":false}

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

AWS Project to Build and Deploy LSTM Model with Sagemaker
In this AWS Sagemaker Project, you will learn to build a LSTM model on Sagemaker for sales forecasting while analyzing the impact of weather conditions on Sales.

Abstractive Text Summarization using Transformers-BART Model
Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Word2Vec and FastText Word Embedding with Gensim in Python
In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

NLP Project on LDA Topic Modelling Python using RACE Dataset
Use the RACE dataset to extract a dominant topic from each document and perform LDA topic modeling in python.

Loan Eligibility Prediction Project using Machine learning on GCP
Loan Eligibility Prediction Project - Use SQL and Python to build a predictive model on GCP to determine whether an application requesting loan is eligible or not.

Ola Bike Rides Request Demand Forecast
Given big data at taxi service (ride-hailing) i.e. OLA, you will learn multi-step time series forecasting and clustering with Mini-Batch K-means Algorithm on geospatial data to predict future ride requests for a particular region at a given time.

Build a Multi Touch Attribution Machine Learning Model in Python
Identifying the ROI on marketing campaigns is an essential KPI for any business. In this ML project, you will learn to build a Multi Touch Attribution Model in Python to identify the ROI of various marketing efforts and their impact on conversions or sales..

BERT Text Classification using DistilBERT and ALBERT Models
This Project Explains how to perform Text Classification using ALBERT and DistilBERT

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.