This recipe helps you Create simulated data for regression in Python


Recipe Objective

Many times we need dataset for practice or to test some model so we can create a simulated dataset for any model from python itself.

So this is the recipe on we can Create simulated data for regression in Python.

Step 1 - Import the library

import pandas as pd from sklearn import datasets

We have imported datasets and pandas. These two modules will be required.

Step 2 - Creating the Simulated Data

We can create Datasets for regression by passing the parameters which are required for regression like n_samples, n_features, n_targets etc. The function will give the output as a dataset features, output and coefficient. features, output, coef = datasets.make_regression(n_samples = 80, n_features = 4, n_informative = 4, n_targets = 1, noise = 0.0, coef = True)

Step 3 - Printing the Dataset

Here we have printed the dataset's different components i.e. Features, Output and Coef. print(pd.DataFrame(features, columns=['Feature_1', 'Feature_2', 'Feature_3', 'Feature_4']).head()) print(pd.DataFrame(output, columns=['Target']).head()) print(pd.DataFrame(coef, columns=['True Coefficient Values'])) So the output comes as

   Feature_1  Feature_2  Feature_3  Feature_4
0  -0.061616   0.322765   1.329021  -0.975053
1   0.489019  -0.838662   0.445058  -0.244990
2   0.324046   0.656792  -0.034017  -1.445877
3   0.227775  -0.174360   0.652398  -0.336352
4   0.837811  -2.410269  -0.368019  -1.066476

0  -68.619492
1  -16.114323
2 -122.108491
3  -18.132927
4 -124.770731

   True Coefficient Values
0                26.722153
1                15.494463
2                17.067228
3                97.078600

