How to do Agglomerative Clustering in Python?

This recipe helps you do Agglomerative Clustering in Python

Recipe Objective

Have you ever tried to do Agglomerative Clustering in python? Clustering can give us an idea that how the data set is in groups.

So this is the recipe on how we can do Agglomerative Clustering in Python.

Hands-On Guide to the Art of Tuning Locality Sensitive Hashing in Python

Step 1 - Import the library

from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.cluster import AgglomerativeClustering import pandas as pd import seaborn as sns import matplotlib.pyplot as plt

We have imported datasets, StandardScaler, AgglomerativeClustering, pandas, and seaborn which will be needed for the dataset.

Step 2 - Setting up the Data

We have imported inbuilt iris dataset and stored data in x. We have plotted a heatmap for corelation of features. iris = datasets.load_iris() X = iris.data; data = pd.DataFrame(X) cor = data.corr() sns.heatmap(cor, square = True); plt.show()

Step 3 - Training model and Predicting Clusters

Here we we are first standarizing the data by standardscaler. scaler = StandardScaler() X_std = scaler.fit_transform(X) Now we are using AffinityPropagation for clustering with features:

  • linkage: It determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion.
  • n_clusters: It is the number of clusters we want to have
  • affinity: In this we have to choose between euclidean, l1, l2 etc.

clt = AgglomerativeClustering(linkage="complete", affinity="euclidean", n_clusters=5) We are training the data by using clt.fit and printing the number of clusters. model = clt.fit(X_std) Finally we are predicting the clusters. clusters = pd.DataFrame(model.fit_predict(X_std)) data["Cluster"] = clusters

Step 4 - Visualizing the output

fig = plt.figure(); ax = fig.add_subplot(111) scatter = ax.scatter(data[0],data[1], c=data["Cluster"],s=50) ax.set_title("Agglomerative Clustering") ax.set_xlabel("X0"); ax.set_ylabel("X1") plt.colorbar(scatter); plt.show()

We have plot a sactter plot which will show the clusters of data in different colour.

 

Download Materials

What Users are saying..

profile image

Jingwei Li

Graduate Research assistance at Stony Brook University
linkedin profile url

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Build Customer Propensity to Purchase Model in Python
In this machine learning project, you will learn to build a machine learning model to estimate customer propensity to purchase.

Image Classification Model using Transfer Learning in PyTorch
In this PyTorch Project, you will build an image classification model in PyTorch using the ResNet pre-trained model.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

End-to-End Snowflake Healthcare Analytics Project on AWS-2
In this AWS Snowflake project, you will build an end to end retraining pipeline by checking Data and Model Drift and learn how to redeploy the model if needed

Deep Learning Project for Beginners with Source Code Part 1
Learn to implement deep neural networks in Python .

End-to-End Speech Emotion Recognition Project using ANN
Speech Emotion Recognition using RAVDESS Audio Dataset - Build an Artificial Neural Network Model to Classify Audio Data into various Emotions like Sad, Happy, Angry, and Neutral

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Learn Object Tracking (SOT, MOT) using OpenCV and Python
Get Started with Object Tracking using OpenCV and Python - Learn to implement Multiple Instance Learning Tracker (MIL) algorithm, Generic Object Tracking Using Regression Networks Tracker (GOTURN) algorithm, Kernelized Correlation Filters Tracker (KCF) algorithm, Tracking, Learning, Detection Tracker (TLD) algorithm for single and multiple object tracking from various video clips.