How to standardise IRIS Data in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to standardise IRIS Data in Python?

How to standardise IRIS Data in Python?

This recipe helps you standardise IRIS Data in Python

0

Recipe Objective

It is very rare to find a raw dataset which perfectly follows certain specific distribution. Usually every dataset needs to be standarize by any means.

So this is the recipe on how we can standarise IRIS Data in Python.

Step 1 - Import the library

from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler

We have only imported datasets, train_test_split and standardscaler which is needed.

Step 2 - Setting up the Data

We have imported an inbuilt iris dataset to use test_train_split. We have stored data in X and target in y. iris = datasets.load_iris() X = iris.data y = iris.target

Step 3 - Splitting the Data

So now we are using test_train_split to split the data. We have passed test_size as 0.33 which means 33% of data will be in the test part and rest will be in train part. Parameter random_state signifies the random splitting of data into the two parts. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

Step 4 - Using StandardScaler

StandardScaler is used to remove the outliners and scale the data by making the mean of the data 0 and standard deviation as 1. So we are creating an object std_scl to use standardScaler.
We have fitted the train data and transformed train and test data form standard scaler. Finally we have printed first five elements of test, train, scaled train and scaled test. std_slc = StandardScaler() std_slc.fit(X_train) X_train_std = std_slc.transform(X_train) X_test_std = std_slc.transform(X_test) print(X_train[0:5]) print(X_train_std[0:5]) print(X_test[0:5]) print(X_test_std[0:5]) As an output we get

[[6.7 3.3 5.7 2.1]
 [5.  2.3 3.3 1. ]
 [6.  2.9 4.5 1.5]
 [6.7 3.1 5.6 2.4]
 [5.  3.6 1.4 0.2]]

[[ 1.16345928  0.47610991  1.22532919  1.30349721]
 [-0.97534073 -1.7757613  -0.15378182 -0.1696332 ]
 [ 0.28277692 -0.42463857  0.53577368  0.49997153]
 [ 1.16345928  0.02573567  1.16786623  1.70526004]
 [-0.97534073  1.15167128 -1.24557804 -1.24100076]]

[[5.1 3.8 1.9 0.4]
 [6.6 2.9 4.6 1.3]
 [5.5 2.4 3.7 1. ]
 [6.3 2.3 4.4 1.3]
 [7.7 2.6 6.9 2.3]]

[[-0.84952897  1.60204552 -0.95826324 -0.97315887]
 [ 1.03764751 -0.42463857  0.59323664  0.23212964]
 [-0.34628191 -1.55057418  0.07607001 -0.1696332 ]
 [ 0.66021222 -1.7757613   0.47831072  0.23212964]
 [ 2.42157693 -1.10019993  1.91488469  1.5713391 ]]

Relevant Projects

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.