How to reduce dimentionality using PCA in Python?

This recipe helps you reduce dimentionality using PCA in Python

Recipe Objective

In many datasets we find that number of features are very large and if we want to train the model it take more computational cost. To decrease the number of features we can use Principal component analysis (PCA). PCA decrease the number of features by selecting dimension of features which have most of the variance.

So this recipe is a short example of how can reduce dimentionality using PCA in Python.

Master the Art of Data Cleaning in Machine Learning

Step 1 - Import the library

from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA

Here we have imported various modules like PCA, datasets and StandardScale from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Setup the Data

Here we have used datasets to load the inbuilt digits dataset. digits = datasets.load_digits()

Step 3 - Using StandardScaler

StandardScaler is used to remove the outliners and scale the data by making the mean of the data 0 and standard deviation as 1. X = StandardScaler().fit_transform(digits.data) print(); print(X)

Step 4 - Using PCA

We are also using Principal Component Analysis(PCA) which will reduce the dimension of features by creating new features which have most of the varience of the original data. We have passed the parameter n_components as 0.85 which is the percentage of feature in final dataset. We have also printed shape of intial and final dataset. pca = PCA(n_components=0.85, whiten=True) X_pca = pca.fit_transform(X) print(X_pca) print("Original number of features:", X.shape[1]) print("Reduced number of features:", X_pca.shape[1]) Foe better understanding we are applying PCA again. Now We have passed the parameter n_components as 0.85 which is the percentage of feature in final dataset. We have also printed shape of intial and final dataset. pca = PCA(n_components=2, whiten=True) X_pca = pca.fit_transform(X) print(X_pca) print("Original number of features:", X.shape[1]) print("Reduced number of features:", X_pca.shape[1]) As an output we get:

[[ 0.         -0.33501649 -0.04308102 ... -1.14664746 -0.5056698
  -0.19600752]
 [ 0.         -0.33501649 -1.09493684 ...  0.54856067 -0.5056698
  -0.19600752]
 [ 0.         -0.33501649 -1.09493684 ...  1.56568555  1.6951369
  -0.19600752]
 ...
 [ 0.         -0.33501649 -0.88456568 ... -0.12952258 -0.5056698
  -0.19600752]
 [ 0.         -0.33501649 -0.67419451 ...  0.8876023  -0.5056698
  -0.19600752]
 [ 0.         -0.33501649  1.00877481 ...  0.8876023  -0.26113572
  -0.19600752]]

[[ 0.70631939 -0.39512814 -1.73816236 ...  0.60320435 -0.94455291
  -0.60204272]
 [ 0.21732591  0.38276482  1.72878893 ... -0.56722002  0.61131544
   1.02457999]
 [ 0.4804351  -0.13130437  1.33172761 ... -1.51284419 -0.48470912
  -0.52826811]
 ...
 [ 0.37732433 -0.0612296   1.0879821  ...  0.04925597  0.29271531
  -0.33891255]
 [ 0.39705007 -0.15768102 -1.08160094 ...  1.31785641  0.38883981
  -1.21854835]
 [-0.46407544 -0.92213976  0.12493334 ... -1.27242756 -0.34190284
  -1.17852306]]
Original number of features: 64
Reduced number of features: 25

[[ 0.70634542 -0.39504744]
 [ 0.21730901  0.38270788]
 [ 0.48044955 -0.13126596]
 ...
 [ 0.37733004 -0.06120936]
 [ 0.39703595 -0.15774013]
 [-0.46406594 -0.92210953]]
Original number of features: 64
Reduced number of features: 2

Download Materials

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Langchain Project for Customer Support App in Python
In this LLM Project, you will learn how to enhance customer support interactions through Large Language Models (LLMs), enabling intelligent, context-aware responses. This Langchain project aims to seamlessly integrate LLM technology with databases, PDF knowledge bases, and audio processing agents to create a comprehensive customer support application.

Skip Gram Model Python Implementation for Word Embeddings
Skip-Gram Model word2vec Example -Learn how to implement the skip gram algorithm in NLP for word embeddings on a set of documents.

Build a Speech-Text Transcriptor with Nvidia Quartznet Model
In this Deep Learning Project, you will leverage transfer learning from Nvidia QuartzNet pre-trained models to develop a speech-to-text transcriptor.

End-to-End ML Model Monitoring using Airflow and Docker
In this MLOps Project, you will learn to build an end to end pipeline to monitor any changes in the predictive power of model or degradation of data.

Deploy Transformer BART Model for Text summarization on GCP
Learn to Deploy a Machine Learning Model for the Abstractive Text Summarization on Google Cloud Platform (GCP)

AWS MLOps Project to Deploy Multiple Linear Regression Model
Build and Deploy a Multiple Linear Regression Model in Python on AWS

Build CI/CD Pipeline for Machine Learning Projects using Jenkins
In this project, you will learn how to create a CI/CD pipeline for a search engine application using Jenkins.

Time Series Project to Build a Multiple Linear Regression Model
Learn to build a Multiple linear regression model in Python on Time Series Data

Hands-On Approach to Causal Inference in Machine Learning
In this Machine Learning Project, you will learn to implement various causal inference techniques in Python to determine, how effective the sprinkler is in making the grass wet.

Isolation Forest Model and LOF for Anomaly Detection in Python
Credit Card Fraud Detection Project - Build an Isolation Forest Model and Local Outlier Factor (LOF) in Python to identify fraudulent credit card transactions.