How to implement Hierarchical clustering in R

In this recipe, we shall learn how to implement an hierarchical clustering with the help of an example in R.

Recipe Objective: How to implement Hierarchical clustering in R?

Hierarchical clustering is an unsupervised clustering method that aims at building a hierarchy of clusters. Unlike the k-means approach, it does not require us to define the number of clusters produced in advance. Furthermore, compared to K-means clustering, hierarchical clustering produces a dendrogram, a visually appealing tree-based representation of the observations. Hierarchical clustering has two types- Divisive and Agglomerative. Agglomerative can further be classified based on how the distance between each cluster is measured. It could be single linkage, average linkage, or complete linkage. The steps to implement hierarchical clustering in R are as follows-

Step 1: Load the required packages

#loading required packages
library(dplyr)
library(cluster)
library(sparcl)

Step 2: Load the dataset

We will make use of the iris dataframe. Iris is an inbuilt data frame that gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.

#loading the dataset
a=iris
a=a[,-5]

Step 3: Calculate the distance matrix

#calculating the distance matrix
dist <- dist(a, method = 'euclidean')

Step 4: Apply clustering algorithms

1)Single Link

#applying single link clustering algorithm to the model
h1=hclust(dis,method='single')
h1

Output:
Call:
hclust(d = dis, method = "single")

Cluster method   : single 
Distance         : euclidean 
Number of objects: 150 

#plotting the dendogram
plot(h1,hang=-1, main='single link')
#Cutting tree by height
c=cutree(h,3)
ColorDendrogram(h1,y=c, main='Single Link')

2) Average Link

#applying average link clustering algorithm to the model
h2=hclust(dis,method='average')
h2

Output:
Call:
hclust(d = dis, method = "average")

Cluster method   : average 
Distance         : euclidean 
Number of objects: 150 

#plotting the dendogram
plot(h2, hang=-1, main='Average Link')
#Cutting tree by height
c=cutree(h,3)
ColorDendrogram(h2,y=c, main='Average Link')

3) Complete Link

#applying complete link clustering algorithm to the model
h3=hclust(dis,method='complete')
h3

Output:
Call:
hclust(d = dis, method = "complete")

Cluster method   : complete 
Distance         : euclidean 
Number of objects: 150 

#plotting the dendogram
plot(h3,hang=-1,main='complete link')
#Cutting tree by height
c=cutree(h,3)
ColorDendrogram(h3,y=c, main='Complete Link')

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Learn Hyperparameter Tuning for Neural Networks with PyTorch
In this Deep Learning Project, you will learn how to optimally tune the hyperparameters (learning rate, epochs, dropout, early stopping) of a neural network model in PyTorch to improve model performance.

ML Model Deployment on AWS for Customer Churn Prediction
MLOps Project-Deploy Machine Learning Model to Production Python on AWS for Customer Churn Prediction

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Azure Text Analytics for Medical Search Engine Deployment
Microsoft Azure Project - Use Azure text analytics cognitive service to deploy a machine learning model into Azure Databricks

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

Build Customer Propensity to Purchase Model in Python
In this machine learning project, you will learn to build a machine learning model to estimate customer propensity to purchase.

Build CI/CD Pipeline for Machine Learning Projects using Jenkins
In this project, you will learn how to create a CI/CD pipeline for a search engine application using Jenkins.

AWS MLOps Project to Deploy a Classification Model [Banking]
In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

Learn to Build a Siamese Neural Network for Image Similarity
In this Deep Learning Project, you will learn how to build a siamese neural network with Keras and Tensorflow for Image Similarity.

Build an End-to-End AWS SageMaker Classification Model
MLOps on AWS SageMaker -Learn to Build an End-to-End Classification Model on SageMaker to predict a patient’s cause of death.