How to implement DBSCAN clustering in R

In this recipe, we shall learn how to implement an unsupervised learning algorithm - the DBSCAN clustering algorithm with the help of an example in R.

Recipe Objective: How to implement DBSCAN clustering in R?

DBSCAN stands for Density-based spatial Clustering of applications with noise. It is an unsupervised, density-based clustering algorithm. Density-Based Clustering is a term used to describe unsupervised learning algorithms for identifying unique clusters in data. It is based on the premise that in a data space, a cluster is nothing but a region of high point density separated from other clusters by areas of low point density.

Step 1: Load the required packages

#loading required packages
library(fpc)

Step 2: Load the dataset

We will make use of the iris dataframe. Iris is an inbuilt data frame that gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.

#loading the dataset
data(iris)

Step 3: Check the structure of the dataset

#checking the structure of the dataset
str(iris)

'data.frame':	150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

All four independent variables are of numeric types, and our dependent or predictor variable is a factor with three levels(3 species).

Step 4: Remove the y-label

#removing the species label from the dataset
df = iris[,-5]

Step 5: Model fitting

#fitting the DBSCAN clustering model
model=dbscan(df,eps=0.45,MinPts=5,seeds=TRUE)
model

	dbscan Pts=150 MinPts=5 eps=0.45
        0  1  2
border 24  4 13
seed    0 44 65
total  24 48 78

Step 6: Checking identification of each observation

#checking cluster
model$cluster

  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1
 [36] 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 0 2 2 0 2 0 2 2 2 2 2 0 2
 [71] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 2 2 2 2 2 0 2 2 2 2 0 2 2 2 2 2 2
[106] 0 0 0 0 0 2 2 2 2 0 2 2 0 0 2 2 2 0 2 2 0 2 2 2 0 0 0 2 2 0 0 2 2 2 2
[141] 2 2 2 2 2 2 2 2 2 2

Step 7: Confusion matrix

#table
table(model$cluster, iris$Species)

    setosa versicolor virginica
  0      2          7        15
  1     48          0         0
  2      0         43        35

Step 8: Plot the clusters

#plotting Cluster
plot(model, df, main = "DBScan clusters")

What Users are saying..

profile image

Savvy Sahai

Data Science Intern, Capgemini
linkedin profile url

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More

Relevant Projects

Many-to-One LSTM for Sentiment Analysis and Text Generation
In this LSTM Project , you will build develop a sentiment detection model using many-to-one LSTMs for accurate prediction of sentiment labels in airline text reviews. Additionally, we will also train many-to-one LSTMs on 'Alice's Adventures in Wonderland' to generate contextually relevant text.

NLP Project for Multi Class Text Classification using BERT Model
In this NLP Project, you will learn how to build a multi-class text classification model using using the pre-trained BERT model.

Time Series Analysis with Facebook Prophet Python and Cesium
Time Series Analysis Project - Use the Facebook Prophet and Cesium Open Source Library for Time Series Forecasting in Python

BERT Text Classification using DistilBERT and ALBERT Models
This Project Explains how to perform Text Classification using ALBERT and DistilBERT

Build CNN for Image Colorization using Deep Transfer Learning
Image Processing Project -Train a model for colorization to make grayscale images colorful using convolutional autoencoders.

Loan Eligibility Prediction in Python using H2O.ai
In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

Build a Autoregressive and Moving Average Time Series Model
In this time series project, you will learn to build Autoregressive and Moving Average Time Series Models to forecast future readings, optimize performance, and harness the power of predictive analytics for sensor data.

Build an AI Chatbot from Scratch using Keras Sequential Model
In this NLP Project, you will learn how to build an AI Chatbot from Scratch using Keras Sequential Model.

Build a Review Classification Model using Gated Recurrent Unit
In this Machine Learning project, you will build a classification model in python to classify the reviews of an app on a scale of 1 to 5 using Gated Recurrent Unit.

NLP Project to Build a Resume Parser in Python using Spacy
Use the popular Spacy NLP python library for OCR and text classification to build a Resume Parser in Python.