This recipe helps you Create simulated data for clustering in Python
Do you ever wanted to generate dataset from python itself for any use. We can generate different types of data for different purposes from python.
So this recipe is a short example of how we can Create simulated data for clustering in Python.
Master the Art of Data Cleaning in Machine Learning
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
import pandas as pd
Here we have imported modules pandas and make_blobs from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.
Here we are using make_blobs to generate a cluster data. We have stored features and targets.
features, clusters = make_blobs(n_samples = 2000,
n_features = 10,
centers = 5,
cluster_std = 0.4,
shuffle = True)
We are viewing first five rows of dataset.
print("Feature Matrix: ");
print(pd.DataFrame(features, columns=["Feature 1", "Feature 2", "Feature 3",
"Feature 4", "Feature 5", "Feature 6", "Feature 7", "Feature 8",
"Feature 9", "Feature 10"]).head())
We are ploting scatter plot of the dataset.
plt.scatter(features[:,0], features[:,1])
plt.show()
So the output comes as:
Feature Matrix:
Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Feature 6
0 -3.250833 8.562522 9.593569 -3.485778 -7.546606 5.552687
1 9.054550 -7.848605 6.113184 -1.216320 0.938390 -0.014400
2 -3.283226 8.265441 9.444884 -4.683565 -9.065774 5.621277
3 9.046466 -7.939761 5.010928 -0.324473 0.564307 0.236226
4 -5.023092 3.376868 -1.774365 0.098546 -0.511007 2.635681
Feature 7 Feature 8 Feature 9 Feature 10
0 -2.705651 -5.992366 -1.286639 9.337890
1 4.675954 3.914470 2.751996 4.704688
2 -1.872878 -5.695557 -0.861680 9.692971
3 4.224936 4.444636 2.813714 4.280825
4 -3.561718 -4.892824 0.898923 -0.429435
Download Materials