Have you ever tried to use Clustering by K nearest means.
So this recipe is a short example of how we we can do KMeans Clustering in Python.
from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.cluster import KMeans import pandas as pd import seaborn as sns import matplotlib.pyplot as plt
Here we have imported various modules like datasets, KMeans and test_train_split from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.
Here we have used datasets to load the inbuilt iris dataset and we have created object X and made a dataframe. We have plotted a heat map of correlation between the features.
iris = datasets.load_iris()
X = iris.data
data = pd.DataFrame(X)
cor = data.corr()
fig = plt.figure(figsize=(12,10));
sns.heatmap(cor, square = True); plt.show()
Here, First we have used standardscaler to standarise the data such that the mean becomes zero and the standard deviation becomes 1. we are using Kmeans with n_clusters equals to 3 as a Machine Learning model to fit the data.
scaler = StandardScaler()
X_std = scaler.fit_transform(X)
clt = KMeans(n_clusters=3)
model = clt.fit(X_std)
Now we have predicted the output by passing X_std and the clusters.
clusters = pd.DataFrame(model.fit_predict(X_std))
data["Cluster"] = clusters
Here we have ploted the clusters such that data points of a cluster have the same colour.
fig = plt.figure(figsize=(12,10)); ax = fig.add_subplot(111)
scatter = ax.scatter(data,data, c=data["Cluster"],s=50)
Output comes as: