This recipe helps you do PCA with Dask


Recipe Objective

PCA stands for **principal component Analysis**. It is used to reduce the dimensionality of a model using SVD to project in the lower dimensional data.

This algorithm depends on the size of the input data, SVD can be much more memory efficient than a PCA, and it allows sparse input as well. This algorithm has constant memory complexity.

#!pip install dask_ml #!pip install dask distributed --upgrade

Step 1- Importing Libraries.

Importing PCA from dask_ml.decomposition along with other libraries.

import numpy as np import dask.array as da from dask_ml.decomposition import PCA

Step 2- Creating arrays.

We will create multi dimensional array.

x = np.array([[1, -6], [2, -5], [3, -4], [4, -3], [5, -2], [6, -1]]) X = da.from_array(x, chunks=x.shape)

Step 3- Applying PCA to the arrays.

We will reduce the features by applying PCA to the arrays.

pca = PCA(n_components=2)

Step 4- Printing explained variance ratio.

We will print the explained variance ratio to better understand the model working.


