How to standardise IRIS Data in Python?

This recipe helps you standardise IRIS Data in Python
Last Updated: 26 Dec 2022

Get access to Data Science projects View all Data Science projects

DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

It is very rare to find a raw dataset which perfectly follows certain specific distribution. Usually every dataset needs to be standarize by any means.

So this is the recipe on how we can standarise IRIS Data in Python.

Master the Art of Data Cleaning in Machine Learning

Recipe Objective

Step 1 - Import the library

from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler

We have only imported datasets, train_test_split and standardscaler which is needed.

Step 2 - Setting up the Data

We have imported an inbuilt iris dataset to use test_train_split. We have stored data in X and target in y. iris = datasets.load_iris() X = iris.data y = iris.target

Step 3 - Splitting the Data

So now we are using test_train_split to split the data. We have passed test_size as 0.33 which means 33% of data will be in the test part and rest will be in train part. Parameter random_state signifies the random splitting of data into the two parts. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

Step 4 - Using StandardScaler

StandardScaler is used to remove the outliners and scale the data by making the mean of the data 0 and standard deviation as 1. So we are creating an object std_scl to use standardScaler. We have fitted the train data and transformed train and test data form standard scaler. Finally we have printed first five elements of test, train, scaled train and scaled test. std_slc = StandardScaler() std_slc.fit(X_train) X_train_std = std_slc.transform(X_train) X_test_std = std_slc.transform(X_test) print(X_train[0:5]) print(X_train_std[0:5]) print(X_test[0:5]) print(X_test_std[0:5]) As an output we get

[[6.7 3.3 5.7 2.1]
 [5.  2.3 3.3 1. ]
 [6.  2.9 4.5 1.5]
 [6.7 3.1 5.6 2.4]
 [5.  3.6 1.4 0.2]]

[[ 1.16345928  0.47610991  1.22532919  1.30349721]
 [-0.97534073 -1.7757613  -0.15378182 -0.1696332 ]
 [ 0.28277692 -0.42463857  0.53577368  0.49997153]
 [ 1.16345928  0.02573567  1.16786623  1.70526004]
 [-0.97534073  1.15167128 -1.24557804 -1.24100076]]

[[5.1 3.8 1.9 0.4]
 [6.6 2.9 4.6 1.3]
 [5.5 2.4 3.7 1. ]
 [6.3 2.3 4.4 1.3]
 [7.7 2.6 6.9 2.3]]

[[-0.84952897  1.60204552 -0.95826324 -0.97315887]
 [ 1.03764751 -0.42463857  0.59323664  0.23212964]
 [-0.34628191 -1.55057418  0.07607001 -0.1696332 ]
 [ 0.66021222 -1.7757613   0.47831072  0.23212964]
 [ 2.42157693 -1.10019993  1.91488469  1.5713391 ]]

Download Materials

iPython Notebook

What Users are saying..

Gautam Vermani

Data Consultant at Confidential

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Build CNN for Image Colorization using Deep Transfer Learning

Image Processing Project -Train a model for colorization to make grayscale images colorful using convolutional autoencoders.

View Project Details

Learn to Build a Neural network from Scratch using NumPy

In this deep learning project, you will learn to build a neural network from scratch using NumPy

View Project Details

Build an Image Classifier for Plant Species Identification

In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

View Project Details

Build a Credit Default Risk Prediction Model with LightGBM

In this Machine Learning Project, you will build a classification model for default prediction with LightGBM.

View Project Details

Multi-Class Text Classification with Deep Learning using BERT

In this deep learning project, you will implement one of the most popular state of the art Transformer models, BERT for Multi-Class Text Classification

View Project Details

Isolation Forest Model and LOF for Anomaly Detection in Python

Credit Card Fraud Detection Project - Build an Isolation Forest Model and Local Outlier Factor (LOF) in Python to identify fraudulent credit card transactions.

View Project Details

End-to-End Snowflake Healthcare Analytics Project on AWS-1

In this Snowflake Healthcare Analytics Project, you will leverage Snowflake on AWS to predict patient length of stay (LOS) in hospitals. The prediction of LOS can help in efficient resource allocation, lower the risk of staff/visitor infections, and improve overall hospital functioning.

View Project Details

How to standardise IRIS Data in Python?

Recipe Objective

Table of Contents

Step 1 - Import the library

Step 2 - Setting up the Data

Step 3 - Splitting the Data

Step 4 - Using StandardScaler

Gautam Vermani

Relevant Projects

You might also like

Relevant Projects