How to use SciPy Sparse matrix in Python?

This recipe explains How to use SciPy Sparse matrix in Python.

Sparse matrices are an essential tool in data analysis, machine learning, and scientific computing. They efficiently store and manipulate matrices with a substantial number of zero or insignificant elements, saving memory and computation time. In this guide, we will explore how to create, manipulate, and perform various operations with sparse matrices in Python.

Build Piecewise and Spline Regression Models in Python using libraries NumPy, Pandas, and SciPy 

Understanding SciPy Sparse Matrix in Python

A sparse matrix is a data structure designed to store and manipulate matrices with a large number of zero values efficiently. In contrast to traditional dense matrices, sparse matrices only store the non-zero elements, which significantly reduces memory usage and computational complexity. Python provides various libraries, including SciPy and NumPy, to work with sparse matrices.

How to create a Sparse Matrix in Python?

Many a times we work on matrices in Python and making Sparse Matrix manually is quite a hectic process but we know how to use Python, and using we can do this very well for us. There are two popular kinds of matrices: dense and sparse. Sparse matrices have lots of 'zero' values. In machine learning projects, the learning algorithms require the data to be in-memory. If the data needed for the learning (dataframe) is not in the RAM, then the algorithm does not work. By converting a dense matrix into a sparse matrix it can be made to fit in the RAM.

In this guide, we will walk you through creating sparse matrices using SciPy and explore different formats. We will create a dense matrix and then convert it into various formats of sparse matrices using SciPy.

Step 1: Import the Library

import numpy as np

from scipy import sparse

We have imported the necessary libraries to work with sparse matrices.

Step 2: Setting Up the Matrix

Next, we will create a dense matrix that we will use to create sparse matrices. Here's the original matrix:

matrix = np.array([[9, 8, 7],

                   [6, 5, 4],

                   [3, 2, 1]])

print("Original Matrix:\n", matrix)

This step sets up our original dense matrix.

Step 3: Creating Sparse Matrices

Now, we will create various formats of sparse matrices using the original dense matrix. Here are the different formats supported by SciPy:

  • Dictionary Of Keys based sparse matrix (DOK)

  • Block Sparse Row matrix (BSR)

  • Coordinate list matrix (COO)

  • Compressed Sparse Column matrix (CSC)

  • Compressed Sparse Row matrix (CSR)

  • Sparse matrix with DIAgonal storage (DIA)

  • Row-based linked list sparse matrix (LIL)

Let's create these sparse matrices:

Creating Dictionary Of Keys based sparse matrix (DOK)

print(sparse.dok_matrix(matrix))

Creating Block Sparse Row matrix (BSR)

print(sparse.bsr_matrix(matrix))

Creating Coordinate list matrix (COO)

print(sparse.coo_matrix(matrix))

Creating Compressed Sparse Column matrix (CSC)

print(sparse.csc_matrix(matrix))

Creating Compressed Sparse Row matrix (CSR)

print(sparse.csr_matrix(matrix))

Creating Sparse matrix with DIAgonal storage (DIA)

print(sparse.dia_matrix(matrix))

Creating Row-based linked list sparse matrix (LIL)

print(sparse.lil_matrix(matrix))

These steps demonstrate how to create different sparse matrix formats from a dense matrix using SciPy.

Now we are printing the final matrices and the output comes as:

Original Matrix: 

 [[9 8 7]

 [6 5 4]

 [3 2 1]]

Sparse Matrices: 

  (0, 0) 9

  (0, 1) 8

  (0, 2) 7

  (1, 0) 6

  (1, 1) 5

  (1, 2) 4

  (2, 0) 3

  (2, 1) 2

  (2, 2) 1

 

  (0, 0) 9

  (0, 1) 8

  (0, 2) 7

  (1, 0) 6

  (1, 1) 5

  (1, 2) 4

  (2, 0) 3

  (2, 1) 2

  (2, 2) 1

 

  (0, 0) 9

  (0, 1) 8

  (0, 2) 7

  (1, 0) 6

  (1, 1) 5

  (1, 2) 4

  (2, 0) 3

  (2, 1) 2

  (2, 2) 1

 

  (0, 0) 9

  (1, 0) 6

  (2, 0) 3

  (0, 1) 8

  (1, 1) 5

  (2, 1) 2

  (0, 2) 7

  (1, 2) 4

  (2, 2) 1

 

  (0, 0) 9

  (0, 1) 8

  (0, 2) 7

  (1, 0) 6

  (1, 1) 5

  (1, 2) 4

  (2, 0) 3

  (2, 1) 2

  (2, 2) 1

 

  (2, 0) 3

  (1, 0) 6

  (2, 1) 2

  (0, 0) 9

  (1, 1) 5

  (2, 2) 1

  (0, 1) 8

  (1, 2) 4

  (0, 2) 7

 

  (0, 0) 9

  (0, 1) 8

  (0, 2) 7

  (1, 0) 6

  (1, 1) 5

  (1, 2) 4

  (2, 0) 3

  (2, 1) 2

  (2, 2) 1

Converting sparse matrix to full matrix Python

You can convert a sparse matrix to a dense (full) matrix using the .toarray() method. Conversely, you can convert a dense matrix to a sparse matrix in Python to save memory.

dense_matrix = sparse_matrix.toarray()

sparse_matrix = csr_matrix(dense_matrix)

Eigenvalues of Sparse Matrix in Python

To find the eigenvalues of a sparse matrix, you can use libraries like SciPy, which provides functions like eigs for solving eigenvalue problems efficiently. Here's a guide to find the eigenvalues of a sparse matrix:

Step-1 Import the Libraries

We import the necessary libraries, including NumPy and SciPy.

import numpy as np

from scipy.sparse.linalg import eigs

from scipy.sparse import csc_matrix

Step-2 Create a Sparse Matrix

We create a sparse matrix using the csc_matrix constructor. You need to specify the data, row indices, column indices, and the shape of the matrix.

# Create a sparse matrix

data = np.array([1, 2, 3, 4, 5, 6, 7, 8])

row_indices = np.array([0, 0, 1, 1, 2, 2, 3, 3])

column_indices = np.array([0, 3, 1, 2, 0, 3, 1, 2])

sparse_matrix = csc_matrix((data, (row_indices, column_indices)), shape=(4, 4))

Step-3 Finding eigenvalues of a sparse matrix in Python

We use the eigs function from scipy.sparse.linalg to find the eigenvalues of the sparse matrix. The k parameter specifies the number of eigenvalues to compute.

# Find eigenvalues

eigenvalues, _ = eigs(sparse_matrix, k=3)

Step-4 Print the Eigenvalues

Finally, we print the eigenvalues of the sparse matrix.

print("Eigenvalues of the sparse matrix:")

print(eigenvalues)

Make sure to adjust the data, row indices, column indices, and shape according to your specific sparse matrix.

Sparse Matrix Operations

Sparse matrices support various matrix operations, such as addition, subtraction, multiplication, and more. You can perform these operations using the standard arithmetic operators or specialized functions from libraries like SciPy.

from scipy.sparse import csr_matrix

# Create sparse matrices

sparse_matrix1 = csr_matrix(...)

sparse_matrix2 = csr_matrix(...)

# Sparse Matrix sum in Python

result = sparse_matrix1 + sparse_matrix2

#Python Sparse Matrix multiplication

result = sparse_matrix1.dot(sparse_matrix2)

Saving and Loading Sparse Matrices

Use Python to save a sparse matrix to a file and load it later by using libraries like SciPy's scipy.sparse.save_npz and scipy.sparse.load_npz functions.

from scipy.sparse import save_npz, load_npz

# Save sparse matrix to a file

save_npz('sparse_matrix.npz', sparse_matrix)

# Load sparse matrix from a file

loaded_matrix = load_npz('sparse_matrix.npz')

Learn more about Sparse Matrices with ProjectPro!

Sparse matrices are a crucial tool for handling large-scale data and optimizing computational resources. In this guide, we've covered the basics of creating, converting, visualizing, and performing operations with sparse matrices in Python. These skills are invaluable for data scientists, machine learning practitioners, and researchers working with substantial datasets. To further enhance your knowledge and practical experience in data analysis, consider exploring ProjectPro, which offers a wide range of data science and big data  projects. Start your journey of learning and skill development with ProejctPro today.

Download Materials

What Users are saying..

profile image

Jingwei Li

Graduate Research assistance at Stony Brook University
linkedin profile url

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Multilabel Classification Project for Predicting Shipment Modes
Multilabel Classification Project to build a machine learning model that predicts the appropriate mode of transport for each shipment, using a transport dataset with 2000 unique products. The project explores and compares four different approaches to multilabel classification, including naive independent models, classifier chains, natively multilabel models, and multilabel to multiclass approaches.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Build Regression Models in Python for House Price Prediction
In this Machine Learning Regression project, you will build and evaluate various regression models in Python for house price prediction.

Census Income Data Set Project-Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based oncensus data.

Insurance Pricing Forecast Using XGBoost Regressor
In this project, we are going to talk about insurance forecast by using linear and xgboost regression techniques.

End-to-End Snowflake Healthcare Analytics Project on AWS-1
In this Snowflake Healthcare Analytics Project, you will leverage Snowflake on AWS to predict patient length of stay (LOS) in hospitals. The prediction of LOS can help in efficient resource allocation, lower the risk of staff/visitor infections, and improve overall hospital functioning.

Build Real Estate Price Prediction Model with NLP and FastAPI
In this Real Estate Price Prediction Project, you will learn to build a real estate price prediction machine learning model and deploy it on Heroku using FastAPI Framework.

Build a Multi Class Image Classification Model Python using CNN
This project explains How to build a Sequential Model that can perform Multi Class Image Classification in Python using CNN

Recommender System Machine Learning Project for Beginners-1
Recommender System Machine Learning Project for Beginners - Learn how to design, implement and train a rule-based recommender system in Python

Azure Deep Learning-Deploy RNN CNN models for TimeSeries
In this Azure MLOps Project, you will learn to perform docker-based deployment of RNN and CNN Models for Time Series Forecasting on Azure Cloud.