One of the most important model of Machine Learning is Clustering. It takes a bunch of datapoints and put it in a perticular class based on some features.
So this recipe is a short example of how we can do DBSCAN based Clustering in Python
from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.cluster import DBSCAN import pandas as pd import seaborn as sns import matplotlib.pyplot as plt
Here we have imported various modules like DBSCAN, datasets, StandardScale and many more from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.
Here we have used datasets to load the inbuilt iris dataset and we have created objects X and y to store the data and the target value respectively.
iris = datasets.load_iris()
X = iris.data
data = pd.DataFrame(X)
StandardScaler is used to remove the outliners and scale the data by making the mean of the data 0 and standard deviation as 1. So we are creating an object std_scl to use standardScaler.
std_slc = StandardScaler()
X_std = std_slc.fit_transform(X)
We are using DBSCAN as a model and we have trained it by using the data we get after standerd scaling. Then we predicted the clusters and stored it in a dataframe.
clt = DBSCAN()
model = clt.fit(X_std)
clusters = pd.DataFrame(model.fit_predict(X_std))
data["Cluster"] = clusters
Here we are ploting scatterplot of the dataset and marking clusters in same colors.
fig = plt.figure(figsize=(10,10)); ax = fig.add_subplot(111)
scatter = ax.scatter(data,data, c=data["Cluster"],s=50)
As an output we get