How to Create simulated data for clustering in Python?

How to Create simulated data for clustering in Python?

How to Create simulated data for clustering in Python?

This recipe helps you Create simulated data for clustering in Python


Recipe Objective

Do you ever wanted to generate dataset from python itself for any use. We can generate different types of data for different purposes from python.

So this recipe is a short example of how we can Create simulated data for clustering in Python.

Step 1 - Import the library - GridSearchCv

from sklearn.datasets import make_blobs import matplotlib.pyplot as plt import pandas as pd

Here we have imported modules pandas and make_blobs from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Generating the data

Here we are using make_blobs to generate a cluster data. We have stored features and targets.

  • n_samples: It signifies the number of samples(row) we want in our dataset. By default it is set to 100
  • n_features: It signifies the number of features(columns) we want in our dataset. By default it is set to 20
  • centers: It signifies the number of center of clusters we want in the final dataset.
  • cluster_std: It signifies the standard deviation of the clusters.
features, clusters = make_blobs(n_samples = 2000, n_features = 10, centers = 5, cluster_std = 0.4, shuffle = True)

Step 3 - Viewing the dataset

We are viewing first five rows of dataset. print("Feature Matrix: "); print(pd.DataFrame(features, columns=["Feature 1", "Feature 2", "Feature 3", "Feature 4", "Feature 5", "Feature 6", "Feature 7", "Feature 8", "Feature 9", "Feature 10"]).head())

Step 3 - Ploting the dataset

We are ploting scatter plot of the dataset. plt.scatter(features[:,0], features[:,1]) So the output comes as:

Feature Matrix: 
   Feature 1  Feature 2  Feature 3  Feature 4  Feature 5  Feature 6  
0  -3.250833   8.562522   9.593569  -3.485778  -7.546606   5.552687   
1   9.054550  -7.848605   6.113184  -1.216320   0.938390  -0.014400   
2  -3.283226   8.265441   9.444884  -4.683565  -9.065774   5.621277   
3   9.046466  -7.939761   5.010928  -0.324473   0.564307   0.236226   
4  -5.023092   3.376868  -1.774365   0.098546  -0.511007   2.635681   

   Feature 7  Feature 8  Feature 9  Feature 10  
0  -2.705651  -5.992366  -1.286639    9.337890  
1   4.675954   3.914470   2.751996    4.704688  
2  -1.872878  -5.695557  -0.861680    9.692971  
3   4.224936   4.444636   2.813714    4.280825  
4  -3.561718  -4.892824   0.898923   -0.429435 

Relevant Projects

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.