How to reduce dimentionality using PCA in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to reduce dimentionality using PCA in Python?

How to reduce dimentionality using PCA in Python?

This recipe helps you reduce dimentionality using PCA in Python

0

Recipe Objective

In many datasets we find that number of features are very large and if we want to train the model it take more computational cost. To decrease the number of features we can use Principal component analysis (PCA). PCA decrease the number of features by selecting dimension of features which have most of the variance.

So this recipe is a short example of how can reduce dimentionality using PCA in Python.

Step 1 - Import the library

from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA

Here we have imported various modules like PCA, datasets and StandardScale from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Setup the Data

Here we have used datasets to load the inbuilt digits dataset. digits = datasets.load_digits()

Step 3 - Using StandardScaler

StandardScaler is used to remove the outliners and scale the data by making the mean of the data 0 and standard deviation as 1. X = StandardScaler().fit_transform(digits.data) print(); print(X)

Step 4 - Using PCA

We are also using Principal Component Analysis(PCA) which will reduce the dimension of features by creating new features which have most of the varience of the original data. We have passed the parameter n_components as 0.85 which is the percentage of feature in final dataset. We have also printed shape of intial and final dataset. pca = PCA(n_components=0.85, whiten=True) X_pca = pca.fit_transform(X) print(X_pca) print("Original number of features:", X.shape[1]) print("Reduced number of features:", X_pca.shape[1]) Foe better understanding we are applying PCA again. Now We have passed the parameter n_components as 0.85 which is the percentage of feature in final dataset. We have also printed shape of intial and final dataset. pca = PCA(n_components=2, whiten=True) X_pca = pca.fit_transform(X) print(X_pca) print("Original number of features:", X.shape[1]) print("Reduced number of features:", X_pca.shape[1]) As an output we get:

[[ 0.         -0.33501649 -0.04308102 ... -1.14664746 -0.5056698
  -0.19600752]
 [ 0.         -0.33501649 -1.09493684 ...  0.54856067 -0.5056698
  -0.19600752]
 [ 0.         -0.33501649 -1.09493684 ...  1.56568555  1.6951369
  -0.19600752]
 ...
 [ 0.         -0.33501649 -0.88456568 ... -0.12952258 -0.5056698
  -0.19600752]
 [ 0.         -0.33501649 -0.67419451 ...  0.8876023  -0.5056698
  -0.19600752]
 [ 0.         -0.33501649  1.00877481 ...  0.8876023  -0.26113572
  -0.19600752]]

[[ 0.70631939 -0.39512814 -1.73816236 ...  0.60320435 -0.94455291
  -0.60204272]
 [ 0.21732591  0.38276482  1.72878893 ... -0.56722002  0.61131544
   1.02457999]
 [ 0.4804351  -0.13130437  1.33172761 ... -1.51284419 -0.48470912
  -0.52826811]
 ...
 [ 0.37732433 -0.0612296   1.0879821  ...  0.04925597  0.29271531
  -0.33891255]
 [ 0.39705007 -0.15768102 -1.08160094 ...  1.31785641  0.38883981
  -1.21854835]
 [-0.46407544 -0.92213976  0.12493334 ... -1.27242756 -0.34190284
  -1.17852306]]
Original number of features: 64
Reduced number of features: 25

[[ 0.70634542 -0.39504744]
 [ 0.21730901  0.38270788]
 [ 0.48044955 -0.13126596]
 ...
 [ 0.37733004 -0.06120936]
 [ 0.39703595 -0.15774013]
 [-0.46406594 -0.92210953]]
Original number of features: 64
Reduced number of features: 2

Relevant Projects

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.