How to Create simulated data for classification in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to Create simulated data for classification in Python?

How to Create simulated data for classification in Python?

This recipe helps you Create simulated data for classification in Python

0

Recipe Objective

Do you ever wanted to generate dataset from python itself for any use. We can generate different types of data for different purposes from python.

So this recipe is a short example of how we can Create simulated data for classification in Python.

Step 1 - Import the library - GridSearchCv

from sklearn.datasets import make_classification import pandas as pd

Here we have imported modules pandas and make_classification from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Generating the data

Here we are using make_classification to generate a classification data. We have stored features and targets.

  • n_samples: It signifies the number of samples(row) we want in our dataset. By default it is set to 100
  • n_features: It signifies the number of features(columns) we want in our dataset. By default it is set to 20
  • n_informative: It is used to set the number of informative class. By default it is set to 2
  • n_redundant : It is used to set number of redundant features. The features which can be generated as random linear combinations of the informative features. By default it is set to 2
  • n_classes : This signifies the number of classes in target dataset.
features, output = make_classification(n_samples = 50, n_features = 5, n_informative = 5, n_redundant = 0, n_classes = 3, weights = [.2, .3, .8])

Step 3 - Viewing the dataset

We are viewing first 5 observation of the features. print("Feature Matrix: "); print(pd.DataFrame(features, columns=["Feature 1", "Feature 2", "Feature 3", "Feature 4", "Feature 5"]).head()) We are viewing the first 5 observation of target. print() print("Target Class: "); print(pd.DataFrame(output, columns=["TargetClass"]).head()) So the output comes as:

Feature Matrix: 
   Feature 1  Feature 2  Feature 3  Feature 4  Feature 5
0   0.833135  -1.107635  -0.728420   0.101483   1.793259
1   1.120892  -1.856847  -2.490347   1.247622   1.594469
2  -0.980409  -3.042990  -0.482548   4.075172  -1.058840
3   0.827502   2.839329   2.943324  -2.449732   0.303014
4   1.173058  -0.519413   1.240518  -2.643039   2.406873

Target Class: 
   TargetClass
0            2
1            2
2            1
3            0
4            2



Relevant Projects

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.