How to do Agglomerative Clustering in Python?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to do Agglomerative Clustering in Python?

How to do Agglomerative Clustering in Python?

This recipe helps you do Agglomerative Clustering in Python

0

Recipe Objective

Have you ever tried to do Agglomerative Clustering in python? Clustering can give us an idea that how the data set is in groups.

So this is the recipe on how we can do Agglomerative Clustering in Python.

Step 1 - Import the library

from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.cluster import AgglomerativeClustering import pandas as pd import seaborn as sns import matplotlib.pyplot as plt

We have imported datasets, StandardScaler, AgglomerativeClustering, pandas, and seaborn which will be needed for the dataset.

Step 2 - Setting up the Data

We have imported inbuilt iris dataset and stored data in x. We have plotted a heatmap for corelation of features. iris = datasets.load_iris() X = iris.data; data = pd.DataFrame(X) cor = data.corr() sns.heatmap(cor, square = True); plt.show()

Step 3 - Training model and Predicting Clusters

Here we we are first standarizing the data by standardscaler. scaler = StandardScaler() X_std = scaler.fit_transform(X) Now we are using AffinityPropagation for clustering with features:

  • linkage: It determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion.
  • n_clusters: It is the number of clusters we want to have
  • affinity: In this we have to choose between euclidean, l1, l2 etc.
clt = AgglomerativeClustering(linkage="complete", affinity="euclidean", n_clusters=5) We are training the data by using clt.fit and printing the number of clusters. model = clt.fit(X_std) Finally we are predicting the clusters. clusters = pd.DataFrame(model.fit_predict(X_std)) data["Cluster"] = clusters

Step 4 - Visualizing the output

fig = plt.figure(); ax = fig.add_subplot(111) scatter = ax.scatter(data[0],data[1], c=data["Cluster"],s=50) ax.set_title("Agglomerative Clustering") ax.set_xlabel("X0"); ax.set_ylabel("X1") plt.colorbar(scatter); plt.show()

We have plot a sactter plot which will show the clusters of data in different colour.


Relevant Projects

Census Income Data Set Project - Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based on census data.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

NLP and Deep Learning For Fake News Classification in Python
In this project you will use Python to implement various machine learning methods( RNN, LSTM, GRU) for fake news classification.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.