How to standardise IRIS Data in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to standardise IRIS Data in Python?

How to standardise IRIS Data in Python?

This recipe helps you standardise IRIS Data in Python

Recipe Objective

It is very rare to find a raw dataset which perfectly follows certain specific distribution. Usually every dataset needs to be standarize by any means.

So this is the recipe on how we can standarise IRIS Data in Python.

Step 1 - Import the library

from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler

We have only imported datasets, train_test_split and standardscaler which is needed.

Step 2 - Setting up the Data

We have imported an inbuilt iris dataset to use test_train_split. We have stored data in X and target in y. iris = datasets.load_iris() X = iris.data y = iris.target

Step 3 - Splitting the Data

So now we are using test_train_split to split the data. We have passed test_size as 0.33 which means 33% of data will be in the test part and rest will be in train part. Parameter random_state signifies the random splitting of data into the two parts. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

Step 4 - Using StandardScaler

StandardScaler is used to remove the outliners and scale the data by making the mean of the data 0 and standard deviation as 1. So we are creating an object std_scl to use standardScaler.
We have fitted the train data and transformed train and test data form standard scaler. Finally we have printed first five elements of test, train, scaled train and scaled test. std_slc = StandardScaler() std_slc.fit(X_train) X_train_std = std_slc.transform(X_train) X_test_std = std_slc.transform(X_test) print(X_train[0:5]) print(X_train_std[0:5]) print(X_test[0:5]) print(X_test_std[0:5]) As an output we get

[[6.7 3.3 5.7 2.1]
 [5.  2.3 3.3 1. ]
 [6.  2.9 4.5 1.5]
 [6.7 3.1 5.6 2.4]
 [5.  3.6 1.4 0.2]]

[[ 1.16345928  0.47610991  1.22532919  1.30349721]
 [-0.97534073 -1.7757613  -0.15378182 -0.1696332 ]
 [ 0.28277692 -0.42463857  0.53577368  0.49997153]
 [ 1.16345928  0.02573567  1.16786623  1.70526004]
 [-0.97534073  1.15167128 -1.24557804 -1.24100076]]

[[5.1 3.8 1.9 0.4]
 [6.6 2.9 4.6 1.3]
 [5.5 2.4 3.7 1. ]
 [6.3 2.3 4.4 1.3]
 [7.7 2.6 6.9 2.3]]

[[-0.84952897  1.60204552 -0.95826324 -0.97315887]
 [ 1.03764751 -0.42463857  0.59323664  0.23212964]
 [-0.34628191 -1.55057418  0.07607001 -0.1696332 ]
 [ 0.66021222 -1.7757613   0.47831072  0.23212964]
 [ 2.42157693 -1.10019993  1.91488469  1.5713391 ]]

Download Materials

Relevant Projects

Forecasting Business KPI's with Tensorflow and Python
In this machine learning project, you will use the video clip of an IPL match played between CSK and RCB to forecast key performance indicators like the number of appearances of a brand logo, the frames, and the shortest and longest area percentage in the video.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Medical Image Segmentation Deep Learning Project
In this deep learning project, you will learn to implement Unet++ models for medical image segmentation to detect and classify colorectal polyps.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Inventory Demand Forecasting using Machine Learning in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.