How to reduce dimentionality using PCA in Python?

This recipe helps you reduce dimentionality using PCA in Python
Last Updated: 08 Jul 2022

Get access to Data Science projects View all Data Science projects

DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

In many datasets we find that number of features are very large and if we want to train the model it take more computational cost. To decrease the number of features we can use Principal component analysis (PCA). PCA decrease the number of features by selecting dimension of features which have most of the variance.

So this recipe is a short example of how can reduce dimentionality using PCA in Python.

Master the Art of Data Cleaning in Machine Learning

Recipe Objective

Step 1 - Import the library

from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA

Here we have imported various modules like PCA, datasets and StandardScale from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Setup the Data

Here we have used datasets to load the inbuilt digits dataset. digits = datasets.load_digits()

Step 3 - Using StandardScaler

StandardScaler is used to remove the outliners and scale the data by making the mean of the data 0 and standard deviation as 1. X = StandardScaler().fit_transform(digits.data) print(); print(X)

Step 4 - Using PCA

We are also using Principal Component Analysis(PCA) which will reduce the dimension of features by creating new features which have most of the varience of the original data. We have passed the parameter n_components as 0.85 which is the percentage of feature in final dataset. We have also printed shape of intial and final dataset. pca = PCA(n_components=0.85, whiten=True) X_pca = pca.fit_transform(X) print(X_pca) print("Original number of features:", X.shape[1]) print("Reduced number of features:", X_pca.shape[1]) Foe better understanding we are applying PCA again. Now We have passed the parameter n_components as 0.85 which is the percentage of feature in final dataset. We have also printed shape of intial and final dataset. pca = PCA(n_components=2, whiten=True) X_pca = pca.fit_transform(X) print(X_pca) print("Original number of features:", X.shape[1]) print("Reduced number of features:", X_pca.shape[1]) As an output we get:

[[ 0.         -0.33501649 -0.04308102 ... -1.14664746 -0.5056698
  -0.19600752]
 [ 0.         -0.33501649 -1.09493684 ...  0.54856067 -0.5056698
  -0.19600752]
 [ 0.         -0.33501649 -1.09493684 ...  1.56568555  1.6951369
  -0.19600752]
 ...
 [ 0.         -0.33501649 -0.88456568 ... -0.12952258 -0.5056698
  -0.19600752]
 [ 0.         -0.33501649 -0.67419451 ...  0.8876023  -0.5056698
  -0.19600752]
 [ 0.         -0.33501649  1.00877481 ...  0.8876023  -0.26113572
  -0.19600752]]

[[ 0.70631939 -0.39512814 -1.73816236 ...  0.60320435 -0.94455291
  -0.60204272]
 [ 0.21732591  0.38276482  1.72878893 ... -0.56722002  0.61131544
   1.02457999]
 [ 0.4804351  -0.13130437  1.33172761 ... -1.51284419 -0.48470912
  -0.52826811]
 ...
 [ 0.37732433 -0.0612296   1.0879821  ...  0.04925597  0.29271531
  -0.33891255]
 [ 0.39705007 -0.15768102 -1.08160094 ...  1.31785641  0.38883981
  -1.21854835]
 [-0.46407544 -0.92213976  0.12493334 ... -1.27242756 -0.34190284
  -1.17852306]]
Original number of features: 64
Reduced number of features: 25

[[ 0.70634542 -0.39504744]
 [ 0.21730901  0.38270788]
 [ 0.48044955 -0.13126596]
 ...
 [ 0.37733004 -0.06120936]
 [ 0.39703595 -0.15774013]
 [-0.46406594 -0.92210953]]
Original number of features: 64
Reduced number of features: 2

Download Materials

iPython Notebook

What Users are saying..

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Insurance Pricing Forecast Using XGBoost Regressor

In this project, we are going to talk about insurance forecast by using linear and xgboost regression techniques.

View Project Details

A/B Testing Approach for Comparing Performance of ML Models

The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

View Project Details

House Price Prediction Project using Machine Learning in Python

Use the Zillow Zestimate Dataset to build a machine learning model for house price prediction.

View Project Details

Build a Multi Class Image Classification Model Python using CNN

This project explains How to build a Sequential Model that can perform Multi Class Image Classification in Python using CNN

View Project Details

AWS MLOps Project to Deploy a Classification Model [Banking]

In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

View Project Details

Ecommerce product reviews - Pairwise ranking and sentiment analysis

This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

View Project Details

How to reduce dimentionality using PCA in Python?

Recipe Objective

Table of Contents

Step 1 - Import the library

Step 2 - Setup the Data

Step 3 - Using StandardScaler

Step 4 - Using PCA

Anand Kumpatla

Relevant Projects

You might also like

Relevant Projects