How to reduce dimentionality on Sparse Matrix in Python?

This recipe helps you reduce dimentionality on Sparse Matrix in Python
Last Updated: 06 Jul 2022

Get access to Data Science projects View all Data Science projects

DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

While working on a large dataset with many feature and after creating a Sparse Matrix and training a model it takes a high computational cost. Managing and vizualizing the matrix is also very difficult. So we need to reduce the dimension of the matrix.

So this recipe is a short example of how can reduce dimentionality on Sparse Matrix in Python.

Master the Art of Data Cleaning in Machine Learning

Step 1 - Import the library - GridSearchCv

from sklearn.preprocessing import StandardScaler from sklearn.decomposition import TruncatedSVD from scipy.sparse import csr_matrix from sklearn import datasets

Here we have imported various modules like StandardardScaler, datasets, TruncatedSVD and csr_matrix from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Setup the Data

Here we have used datasets to load the inbuilt digits dataset. We have used standardscaler to scale the data such that the mean becomes 0 and standard deviation to 1. We have also made a sparse matrix of the data by the function csr_matrix. digits = datasets.load_digits() X = StandardScaler().fit_transform(digits.data) print(X) X_sparse = csr_matrix(X) print(X_sparse)

Step 3 - Using GridSearchCV

We can truncate the marix that is we can reduce the dimension of the matrix by using the function TruncatedSVD with a parameter n_components which shows the final number of fetures we want. So we have fit and transform the matrix in the function to get the truncated matrix. tsvd = TruncatedSVD(n_components=10) X_sparse_tsvd = tsvd.fit(X_sparse).transform(X_sparse) print(); print(X_sparse_tsvd)

Step 4 - Printing Results

Now we are using print statements to print the results. print("Original number of features:", X_sparse.shape[1]) print("Reduced number of features:", X_sparse_tsvd.shape[1]) print(); print(tsvd.explained_variance_ratio_[0:6].sum()) As an output we get:

[[ 0.         -0.33501649 -0.04308102 ... -1.14664746 -0.5056698
  -0.19600752]
 [ 0.         -0.33501649 -1.09493684 ...  0.54856067 -0.5056698
  -0.19600752]
 [ 0.         -0.33501649 -1.09493684 ...  1.56568555  1.6951369
  -0.19600752]
 ...
 [ 0.         -0.33501649 -0.88456568 ... -0.12952258 -0.5056698
  -0.19600752]
 [ 0.         -0.33501649 -0.67419451 ...  0.8876023  -0.5056698
  -0.19600752]
 [ 0.         -0.33501649  1.00877481 ...  0.8876023  -0.26113572
  -0.19600752]]

  (0, 1)	-0.3350164872543856
  (0, 2)	-0.04308101770538793
  (0, 3)	0.2740715207154218
  (0, 4)	-0.6644775126361527
  (0, 5)	-0.8441293865949171
  (0, 6)	-0.40972392088346243
  (0, 7)	-0.1250229232970408
  (0, 8)	-0.05907755711884675
  (0, 9)	-0.6240092623290964
  (0, 10)	0.4829744992519545
  (0, 11)	0.7596224512649244
  (0, 12)	-0.05842586308220443
  (0, 13)	1.1277211297338117
  (0, 14)	0.8795830595483867
  (0, 15)	-0.13043338063115095
  (0, 16)	-0.04462507326885248
  (0, 17)	0.11144272449970435
  (0, 18)	0.8958804382797294
  (0, 19)	-0.8606663175537699
  (0, 20)	-1.1496484601880896
  (0, 21)	0.5154718747277965
  (0, 22)	1.905963466976408
  (0, 23)	-0.11422184388584329
  (0, 24)	-0.03337972630405602
  (0, 25)	0.48648927722411006
  :	:
  (1796, 38)	-0.8226945146290309
  (1796, 40)	-0.061343668908253476
  (1796, 41)	0.8105536026095989
  (1796, 42)	1.3950951873625397
  (1796, 43)	-0.19072005925701047
  (1796, 44)	-0.5868275383619802
  (1796, 45)	1.3634658076459107
  (1796, 46)	0.5874903313016945
  (1796, 47)	-0.08874161717060432
  (1796, 48)	-0.035433262605025426
  (1796, 49)	4.179200682513991
  (1796, 50)	1.505078217025183
  (1796, 51)	0.0881769306516768
  (1796, 52)	-0.26718796251356636
  (1796, 53)	1.2010187221077009
  (1796, 54)	0.8692294429227895
  (1796, 55)	-0.2097851269640334
  (1796, 56)	-0.023596458909150665
  (1796, 57)	0.7715345500122912
  (1796, 58)	0.47875261517372414
  (1796, 59)	-0.020358468129093202
  (1796, 60)	0.4441643511677691
  (1796, 61)	0.8876022965425754
  (1796, 62)	-0.26113572420685327
  (1796, 63)	-0.1960075186604789

[[ 1.91421562 -0.95449937 -3.94604425 ...  1.4963196   0.1160377
  -0.80839011]
 [ 0.58898173  0.9246434   3.92476559 ...  0.55743317  1.08360629
   0.07914133]
 [ 1.30203646 -0.31719139  3.02334129 ...  1.15547162  0.78332798
  -1.12203121]
 ...
 [ 1.02259528 -0.14791152  2.46997819 ...  0.52912028  2.04799351
  -2.0550423 ]
 [ 1.07605482 -0.38090797 -2.45549106 ...  0.76221796  1.07481616
  -0.33991093]
 [-1.25770756 -2.22760395  0.28362814 ... -1.20258084  0.80783614
  -1.84480729]]

Original number of features: 64
Reduced number of features: 10

0.4561203224142434

Download Materials

iPython Notebook

What Users are saying..

Abhinav Agarwal

Graduate Student at Northwestern University

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Personalized Medicine: Redefining Cancer Treatment

In this Personalized Medicine Machine Learning Project you will learn to classify genetic mutations on the basis of medical literature into 9 classes.

View Project Details

Time Series Forecasting with LSTM Neural Network Python

Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

View Project Details

Stock Price Prediction Project using LSTM and RNN

Learn how to predict stock prices using RNN and LSTM models. Understand deep learning concepts and apply them to real-world financial data for accurate forecasting.

View Project Details

FEAST Feature Store Example for Scaling Machine Learning

FEAST Feature Store Example- Learn to use FEAST Feature Store to manage, store, and discover features for customer churn prediction machine learning project.

View Project Details

Deep Learning Project- Real-Time Fruit Detection using YOLOv4

In this deep learning project, you will learn to build an accurate, fast, and reliable real-time fruit detection system using the YOLOv4 object detection model for robotic harvesting platforms.

View Project Details

Natural language processing Chatbot application using NLTK for text classification

In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

View Project Details

CycleGAN Implementation for Image-To-Image Translation

In this GAN Deep Learning Project, you will learn how to build an image to image translation model in PyTorch with Cycle GAN.

View Project Details

Detectron2 Object Detection and Segmentation Example Python

Object Detection using Detectron2 - Build a Dectectron2 model to detect the zones and inhibitions in antibiogram images.

View Project Details

PyTorch Project to Build a GAN Model on MNIST Dataset

In this deep learning project, you will learn how to build a GAN Model on MNIST Dataset for generating new images of handwritten digits.

View Project Details

Mastering A/B Testing: A Practical Guide for Production

In this A/B Testing for Machine Learning Project, you will gain hands-on experience in conducting A/B tests, analyzing statistical significance, and understanding the challenges of building a solution for A/B testing in a production environment.

View Project Details

How to reduce dimentionality on Sparse Matrix in Python?

Recipe Objective

Step 1 - Import the library - GridSearchCv

Step 2 - Setup the Data

Step 3 - Using GridSearchCV

Step 4 - Printing Results

Abhinav Agarwal

Relevant Projects

You might also like

Relevant Projects