This recipe helps you Create simulated data for clustering in Python


Recipe Objective

Do you ever wanted to generate dataset from python itself for any use. We can generate different types of data for different purposes from python.

So this recipe is a short example of how we can Create simulated data for clustering in Python.

Step 1 - Import the library - GridSearchCv

from sklearn.datasets import make_blobs import matplotlib.pyplot as plt import pandas as pd

Here we have imported modules pandas and make_blobs from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Generating the data

Here we are using make_blobs to generate a cluster data. We have stored features and targets.

  • n_samples: It signifies the number of samples(row) we want in our dataset. By default it is set to 100
  • n_features: It signifies the number of features(columns) we want in our dataset. By default it is set to 20
  • centers: It signifies the number of center of clusters we want in the final dataset.
  • cluster_std: It signifies the standard deviation of the clusters.
features, clusters = make_blobs(n_samples = 2000, n_features = 10, centers = 5, cluster_std = 0.4, shuffle = True)

Step 3 - Viewing the dataset

We are viewing first five rows of dataset. print("Feature Matrix: "); print(pd.DataFrame(features, columns=["Feature 1", "Feature 2", "Feature 3", "Feature 4", "Feature 5", "Feature 6", "Feature 7", "Feature 8", "Feature 9", "Feature 10"]).head())

Step 3 - Ploting the dataset

We are ploting scatter plot of the dataset. plt.scatter(features[:,0], features[:,1]) So the output comes as:

Feature Matrix: 
   Feature 1  Feature 2  Feature 3  Feature 4  Feature 5  Feature 6  
0  -3.250833   8.562522   9.593569  -3.485778  -7.546606   5.552687   
1   9.054550  -7.848605   6.113184  -1.216320   0.938390  -0.014400   
2  -3.283226   8.265441   9.444884  -4.683565  -9.065774   5.621277   
3   9.046466  -7.939761   5.010928  -0.324473   0.564307   0.236226   
4  -5.023092   3.376868  -1.774365   0.098546  -0.511007   2.635681   

   Feature 7  Feature 8  Feature 9  Feature 10  
0  -2.705651  -5.992366  -1.286639    9.337890  
1   4.675954   3.914470   2.751996    4.704688  
2  -1.872878  -5.695557  -0.861680    9.692971  
3   4.224936   4.444636   2.813714    4.280825  
4  -3.561718  -4.892824   0.898923   -0.429435 

