How to Create simulated data for clustering in Python?

How to Create simulated data for clustering in Python?

How to Create simulated data for clustering in Python?

This recipe helps you Create simulated data for clustering in Python


Recipe Objective

Do you ever wanted to generate dataset from python itself for any use. We can generate different types of data for different purposes from python.

So this recipe is a short example of how we can Create simulated data for clustering in Python.

Step 1 - Import the library - GridSearchCv

from sklearn.datasets import make_blobs import matplotlib.pyplot as plt import pandas as pd

Here we have imported modules pandas and make_blobs from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Generating the data

Here we are using make_blobs to generate a cluster data. We have stored features and targets.

  • n_samples: It signifies the number of samples(row) we want in our dataset. By default it is set to 100
  • n_features: It signifies the number of features(columns) we want in our dataset. By default it is set to 20
  • centers: It signifies the number of center of clusters we want in the final dataset.
  • cluster_std: It signifies the standard deviation of the clusters.
features, clusters = make_blobs(n_samples = 2000, n_features = 10, centers = 5, cluster_std = 0.4, shuffle = True)

Step 3 - Viewing the dataset

We are viewing first five rows of dataset. print("Feature Matrix: "); print(pd.DataFrame(features, columns=["Feature 1", "Feature 2", "Feature 3", "Feature 4", "Feature 5", "Feature 6", "Feature 7", "Feature 8", "Feature 9", "Feature 10"]).head())

Step 3 - Ploting the dataset

We are ploting scatter plot of the dataset. plt.scatter(features[:,0], features[:,1]) So the output comes as:

Feature Matrix: 
   Feature 1  Feature 2  Feature 3  Feature 4  Feature 5  Feature 6  
0  -3.250833   8.562522   9.593569  -3.485778  -7.546606   5.552687   
1   9.054550  -7.848605   6.113184  -1.216320   0.938390  -0.014400   
2  -3.283226   8.265441   9.444884  -4.683565  -9.065774   5.621277   
3   9.046466  -7.939761   5.010928  -0.324473   0.564307   0.236226   
4  -5.023092   3.376868  -1.774365   0.098546  -0.511007   2.635681   

   Feature 7  Feature 8  Feature 9  Feature 10  
0  -2.705651  -5.992366  -1.286639    9.337890  
1   4.675954   3.914470   2.751996    4.704688  
2  -1.872878  -5.695557  -0.861680    9.692971  
3   4.224936   4.444636   2.813714    4.280825  
4  -3.561718  -4.892824   0.898923   -0.429435 

Relevant Projects

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.