How to do Agglomerative Clustering in Python?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to do Agglomerative Clustering in Python?

How to do Agglomerative Clustering in Python?

This recipe helps you do Agglomerative Clustering in Python

0

Recipe Objective

Have you ever tried to do Agglomerative Clustering in python? Clustering can give us an idea that how the data set is in groups.

So this is the recipe on how we can do Agglomerative Clustering in Python.

Step 1 - Import the library

from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.cluster import AgglomerativeClustering import pandas as pd import seaborn as sns import matplotlib.pyplot as plt

We have imported datasets, StandardScaler, AgglomerativeClustering, pandas, and seaborn which will be needed for the dataset.

Step 2 - Setting up the Data

We have imported inbuilt iris dataset and stored data in x. We have plotted a heatmap for corelation of features. iris = datasets.load_iris() X = iris.data; data = pd.DataFrame(X) cor = data.corr() sns.heatmap(cor, square = True); plt.show()

Step 3 - Training model and Predicting Clusters

Here we we are first standarizing the data by standardscaler. scaler = StandardScaler() X_std = scaler.fit_transform(X) Now we are using AffinityPropagation for clustering with features:

  • linkage: It determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion.
  • n_clusters: It is the number of clusters we want to have
  • affinity: In this we have to choose between euclidean, l1, l2 etc.
clt = AgglomerativeClustering(linkage="complete", affinity="euclidean", n_clusters=5) We are training the data by using clt.fit and printing the number of clusters. model = clt.fit(X_std) Finally we are predicting the clusters. clusters = pd.DataFrame(model.fit_predict(X_std)) data["Cluster"] = clusters

Step 4 - Visualizing the output

fig = plt.figure(); ax = fig.add_subplot(111) scatter = ax.scatter(data[0],data[1], c=data["Cluster"],s=50) ax.set_title("Agglomerative Clustering") ax.set_xlabel("X0"); ax.set_ylabel("X1") plt.colorbar(scatter); plt.show()

We have plot a sactter plot which will show the clusters of data in different colour.


Relevant Projects

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Human Activity Recognition Using Multiclass Classification in Python
In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.