How to Create simulated data for clustering in Python?

How to Create simulated data for clustering in Python?

How to Create simulated data for clustering in Python?

This recipe helps you Create simulated data for clustering in Python

Recipe Objective

Do you ever wanted to generate dataset from python itself for any use. We can generate different types of data for different purposes from python.

So this recipe is a short example of how we can Create simulated data for clustering in Python.

Step 1 - Import the library - GridSearchCv

from sklearn.datasets import make_blobs import matplotlib.pyplot as plt import pandas as pd

Here we have imported modules pandas and make_blobs from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Generating the data

Here we are using make_blobs to generate a cluster data. We have stored features and targets.

  • n_samples: It signifies the number of samples(row) we want in our dataset. By default it is set to 100
  • n_features: It signifies the number of features(columns) we want in our dataset. By default it is set to 20
  • centers: It signifies the number of center of clusters we want in the final dataset.
  • cluster_std: It signifies the standard deviation of the clusters.
features, clusters = make_blobs(n_samples = 2000, n_features = 10, centers = 5, cluster_std = 0.4, shuffle = True)

Step 3 - Viewing the dataset

We are viewing first five rows of dataset. print("Feature Matrix: "); print(pd.DataFrame(features, columns=["Feature 1", "Feature 2", "Feature 3", "Feature 4", "Feature 5", "Feature 6", "Feature 7", "Feature 8", "Feature 9", "Feature 10"]).head())

Step 3 - Ploting the dataset

We are ploting scatter plot of the dataset. plt.scatter(features[:,0], features[:,1]) So the output comes as:

Feature Matrix: 
   Feature 1  Feature 2  Feature 3  Feature 4  Feature 5  Feature 6  
0  -3.250833   8.562522   9.593569  -3.485778  -7.546606   5.552687   
1   9.054550  -7.848605   6.113184  -1.216320   0.938390  -0.014400   
2  -3.283226   8.265441   9.444884  -4.683565  -9.065774   5.621277   
3   9.046466  -7.939761   5.010928  -0.324473   0.564307   0.236226   
4  -5.023092   3.376868  -1.774365   0.098546  -0.511007   2.635681   

   Feature 7  Feature 8  Feature 9  Feature 10  
0  -2.705651  -5.992366  -1.286639    9.337890  
1   4.675954   3.914470   2.751996    4.704688  
2  -1.872878  -5.695557  -0.861680    9.692971  
3   4.224936   4.444636   2.813714    4.280825  
4  -3.561718  -4.892824   0.898923   -0.429435 

Download Materials

Relevant Projects

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

Human Activity Recognition Using Multiclass Classification in Python
In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.

Medical Image Segmentation Deep Learning Project
In this deep learning project, you will learn to implement Unet++ models for medical image segmentation to detect and classify colorectal polyps.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.