Do you ever wanted to generate dataset from python itself for any use. We can generate different types of data for different purposes from python.
So this recipe is a short example of how we can Create simulated data for clustering in Python.
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
import pandas as pd
Here we have imported modules pandas and make_blobs from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.
Here we are using make_blobs to generate a cluster data. We have stored features and targets.
features, clusters = make_blobs(n_samples = 2000,
n_features = 10,
centers = 5,
cluster_std = 0.4,
shuffle = True)
We are viewing first five rows of dataset.
print("Feature Matrix: ");
print(pd.DataFrame(features, columns=["Feature 1", "Feature 2", "Feature 3",
"Feature 4", "Feature 5", "Feature 6", "Feature 7", "Feature 8",
"Feature 9", "Feature 10"]).head())
We are ploting scatter plot of the dataset.
plt.scatter(features[:,0], features[:,1])
plt.show()
So the output comes as:
Feature Matrix: Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Feature 6 0 -3.250833 8.562522 9.593569 -3.485778 -7.546606 5.552687 1 9.054550 -7.848605 6.113184 -1.216320 0.938390 -0.014400 2 -3.283226 8.265441 9.444884 -4.683565 -9.065774 5.621277 3 9.046466 -7.939761 5.010928 -0.324473 0.564307 0.236226 4 -5.023092 3.376868 -1.774365 0.098546 -0.511007 2.635681 Feature 7 Feature 8 Feature 9 Feature 10 0 -2.705651 -5.992366 -1.286639 9.337890 1 4.675954 3.914470 2.751996 4.704688 2 -1.872878 -5.695557 -0.861680 9.692971 3 4.224936 4.444636 2.813714 4.280825 4 -3.561718 -4.892824 0.898923 -0.429435