How to implement Hierarchical clustering in R

In this recipe, we shall learn how to implement an hierarchical clustering with the help of an example in R.
Last Updated: 05 Sep 2022

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN R DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective: How to implement Hierarchical clustering in R?

Hierarchical clustering is an unsupervised clustering method that aims at building a hierarchy of clusters. Unlike the k-means approach, it does not require us to define the number of clusters produced in advance. Furthermore, compared to K-means clustering, hierarchical clustering produces a dendrogram, a visually appealing tree-based representation of the observations. Hierarchical clustering has two types- Divisive and Agglomerative. Agglomerative can further be classified based on how the distance between each cluster is measured. It could be single linkage, average linkage, or complete linkage. The steps to implement hierarchical clustering in R are as follows-

Recipe Objective: How to implement Hierarchical clustering in R?

Step 1: Load the required packages

#loading required packages library(dplyr) library(cluster) library(sparcl)

Step 2: Load the dataset

We will make use of the iris dataframe. Iris is an inbuilt data frame that gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.

#loading the dataset a=iris a=a[,-5]

Step 3: Calculate the distance matrix

#calculating the distance matrix dist <- dist(a, method = 'euclidean')

Step 4: Apply clustering algorithms

1)Single Link

#applying single link clustering algorithm to the model h1=hclust(dis,method='single') h1

Output:
Call:
hclust(d = dis, method = "single")

Cluster method   : single 
Distance         : euclidean 
Number of objects: 150

#plotting the dendogram plot(h1,hang=-1, main='single link') #Cutting tree by height c=cutree(h,3) ColorDendrogram(h1,y=c, main='Single Link')

2) Average Link

#applying average link clustering algorithm to the model h2=hclust(dis,method='average') h2

Output:
Call:
hclust(d = dis, method = "average")

Cluster method   : average 
Distance         : euclidean 
Number of objects: 150

#plotting the dendogram plot(h2, hang=-1, main='Average Link') #Cutting tree by height c=cutree(h,3) ColorDendrogram(h2,y=c, main='Average Link')

3) Complete Link

#applying complete link clustering algorithm to the model h3=hclust(dis,method='complete') h3

Output:
Call:
hclust(d = dis, method = "complete")

Cluster method   : complete 
Distance         : euclidean 
Number of objects: 150

#plotting the dendogram plot(h3,hang=-1,main='complete link') #Cutting tree by height c=cutree(h,3) ColorDendrogram(h3,y=c, main='Complete Link')

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More