This recipe helps you standardise IRIS Data in Python


Recipe Objective

It is very rare to find a raw dataset which perfectly follows certain specific distribution. Usually every dataset needs to be standarize by any means.

So this is the recipe on how we can standarise IRIS Data in Python.

Step 1 - Import the library

from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler

We have only imported datasets, train_test_split and standardscaler which is needed.

Step 2 - Setting up the Data

We have imported an inbuilt iris dataset to use test_train_split. We have stored data in X and target in y. iris = datasets.load_iris() X = y =

Step 3 - Splitting the Data

So now we are using test_train_split to split the data. We have passed test_size as 0.33 which means 33% of data will be in the test part and rest will be in train part. Parameter random_state signifies the random splitting of data into the two parts. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

Step 4 - Using StandardScaler

StandardScaler is used to remove the outliners and scale the data by making the mean of the data 0 and standard deviation as 1. So we are creating an object std_scl to use standardScaler.
We have fitted the train data and transformed train and test data form standard scaler. Finally we have printed first five elements of test, train, scaled train and scaled test. std_slc = StandardScaler() X_train_std = std_slc.transform(X_train) X_test_std = std_slc.transform(X_test) print(X_train[0:5]) print(X_train_std[0:5]) print(X_test[0:5]) print(X_test_std[0:5]) As an output we get

[[6.7 3.3 5.7 2.1]
 [5.  2.3 3.3 1. ]
 [6.  2.9 4.5 1.5]
 [6.7 3.1 5.6 2.4]
 [5.  3.6 1.4 0.2]]

[[ 1.16345928  0.47610991  1.22532919  1.30349721]
 [-0.97534073 -1.7757613  -0.15378182 -0.1696332 ]
 [ 0.28277692 -0.42463857  0.53577368  0.49997153]
 [ 1.16345928  0.02573567  1.16786623  1.70526004]
 [-0.97534073  1.15167128 -1.24557804 -1.24100076]]

[[5.1 3.8 1.9 0.4]
 [6.6 2.9 4.6 1.3]
 [5.5 2.4 3.7 1. ]
 [6.3 2.3 4.4 1.3]
 [7.7 2.6 6.9 2.3]]

[[-0.84952897  1.60204552 -0.95826324 -0.97315887]
 [ 1.03764751 -0.42463857  0.59323664  0.23212964]
 [-0.34628191 -1.55057418  0.07607001 -0.1696332 ]
 [ 0.66021222 -1.7757613   0.47831072  0.23212964]
 [ 2.42157693 -1.10019993  1.91488469  1.5713391 ]]

