How to standardise IRIS Data in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to standardise IRIS Data in Python?

How to standardise IRIS Data in Python?

This recipe helps you standardise IRIS Data in Python

0

Recipe Objective

It is very rare to find a raw dataset which perfectly follows certain specific distribution. Usually every dataset needs to be standarize by any means.

So this is the recipe on how we can standarise IRIS Data in Python.

Step 1 - Import the library

from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler

We have only imported datasets, train_test_split and standardscaler which is needed.

Step 2 - Setting up the Data

We have imported an inbuilt iris dataset to use test_train_split. We have stored data in X and target in y. iris = datasets.load_iris() X = iris.data y = iris.target

Step 3 - Splitting the Data

So now we are using test_train_split to split the data. We have passed test_size as 0.33 which means 33% of data will be in the test part and rest will be in train part. Parameter random_state signifies the random splitting of data into the two parts. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

Step 4 - Using StandardScaler

StandardScaler is used to remove the outliners and scale the data by making the mean of the data 0 and standard deviation as 1. So we are creating an object std_scl to use standardScaler.
We have fitted the train data and transformed train and test data form standard scaler. Finally we have printed first five elements of test, train, scaled train and scaled test. std_slc = StandardScaler() std_slc.fit(X_train) X_train_std = std_slc.transform(X_train) X_test_std = std_slc.transform(X_test) print(X_train[0:5]) print(X_train_std[0:5]) print(X_test[0:5]) print(X_test_std[0:5]) As an output we get

[[6.7 3.3 5.7 2.1]
 [5.  2.3 3.3 1. ]
 [6.  2.9 4.5 1.5]
 [6.7 3.1 5.6 2.4]
 [5.  3.6 1.4 0.2]]

[[ 1.16345928  0.47610991  1.22532919  1.30349721]
 [-0.97534073 -1.7757613  -0.15378182 -0.1696332 ]
 [ 0.28277692 -0.42463857  0.53577368  0.49997153]
 [ 1.16345928  0.02573567  1.16786623  1.70526004]
 [-0.97534073  1.15167128 -1.24557804 -1.24100076]]

[[5.1 3.8 1.9 0.4]
 [6.6 2.9 4.6 1.3]
 [5.5 2.4 3.7 1. ]
 [6.3 2.3 4.4 1.3]
 [7.7 2.6 6.9 2.3]]

[[-0.84952897  1.60204552 -0.95826324 -0.97315887]
 [ 1.03764751 -0.42463857  0.59323664  0.23212964]
 [-0.34628191 -1.55057418  0.07607001 -0.1696332 ]
 [ 0.66021222 -1.7757613   0.47831072  0.23212964]
 [ 2.42157693 -1.10019993  1.91488469  1.5713391 ]]

Relevant Projects

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.