How to do spectral clustering using Dask?

This recipe helps you do spectral clustering using Dask
Last Updated: 05 May 2021

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

How to do spectral clustering using Dask.

Spectral clustering scales the number of samples as per the model requirement, The Dask version uses an approximation to the affinity matrix, which reduces expensive computation.


#!pip install dask_ml --upgrade
#!pip install dask distributed --upgrade

Step 1- Importing Libraries

We will import the dataset make_circles and the clusters from dask_ml.


from sklearn.datasets import make_circles
from sklearn.utils import shuffle
import pandas as pd
from timeit import default_timer as tic
import dask_ml.cluster

Step 2- Spllting dataset.

We will split the dataset into x and y to feed into clustering algorithm.

x, y = make_circles(n_samples=10_000, noise=0.05, random_state=0, factor=0.2)

Step 3- Creating Clusters.

We will do the spectral clustering, with defining the clusters and n_components, then we will fit the model.


Ns = [500, 1000, 2500, 5000]
timings = []
for n in Ns:
    t1 = tic()
    dask_ml.cluster.SpectralClustering(n_clusters=2, n_components=100).fit(x)
    timings.append(('dask-ml (approximate)', n, tic() - t1))


df = pd.DataFrame(timings, columns=['method', 'Samples', 'Fitting Time'])

df

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Isolation Forest Model and LOF for Anomaly Detection in Python

Credit Card Fraud Detection Project - Build an Isolation Forest Model and Local Outlier Factor (LOF) in Python to identify fraudulent credit card transactions.

View Project Details

A/B Testing Approach for Comparing Performance of ML Models

The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

View Project Details

How to do spectral clustering using Dask?

Recipe Objective

Step 1- Importing Libraries

Step 2- Spllting dataset.

Step 3- Creating Clusters.

Ray han

Relevant Projects

You might also like

Relevant Projects