There are two popular kinds of matrices: dense and sparse. Sparse matrices have lots of 'zero' values. In machine learning projects, the learning algorithms require the data to be in-memory. If the data needed for the learning (dataframe) is not in the RAM, then the algorithm does not work. By converting a dense matrix into a sparse matrix it can be made to fit in the RAM.
There are many data structures that can be used to construct a sparse matrix in python. Python Scipy provides the following ways to represent a sparse matrix:
- Block Sparse Row matrix (BSR)
- Coordinate list matrix (COO)
- Compressed Sparse Column matrix (CSC)
- Compressed Sparse Row matrix (CSR)
- Sparse matrix with DIAgonal storage (DIA)
- Dictionary Of Keys based sparse matrix (DOK)
- Row-based linked list sparse matrix (LIL)
The recipe above takes a dense matrix and displays the various formats of sparse matrix that scipy supports.