How to do PCA with Dask?
PCA stands for **principal component Analysis**. It is used to reduce the dimensionality of a model using SVD to project in the lower dimensional data.
This algorithm depends on the size of the input data, SVD can be much more memory efficient than a PCA, and it allows sparse input as well. This algorithm has constant memory complexity.
#!pip install dask_ml #!pip install dask distributed --upgrade
Importing PCA from dask_ml.decomposition along with other libraries.
import numpy as np import dask.array as da from dask_ml.decomposition import PCA
We will create multi dimensional array.
x = np.array([[1, -6], [2, -5], [3, -4], [4, -3], [5, -2], [6, -1]]) X = da.from_array(x, chunks=x.shape)
We will reduce the features by applying PCA to the arrays.
pca = PCA(n_components=2) pca.fit(X)
We will print the explained variance ratio to better understand the model working.