How to do PCA with Dask?

This recipe helps you do PCA with Dask

Recipe Objective

How to do PCA with Dask?

PCA stands for **principal component Analysis**. It is used to reduce the dimensionality of a model using SVD to project in the lower dimensional data.

This algorithm depends on the size of the input data, SVD can be much more memory efficient than a PCA, and it allows sparse input as well. This algorithm has constant memory complexity.

#!pip install dask_ml #!pip install dask distributed --upgrade

 

Step 1- Importing Libraries.

Importing PCA from dask_ml.decomposition along with other libraries.

import numpy as np import dask.array as da from dask_ml.decomposition import PCA

Step 2- Creating arrays.

We will create multi dimensional array.

x = np.array([[1, -6], [2, -5], [3, -4], [4, -3], [5, -2], [6, -1]]) X = da.from_array(x, chunks=x.shape)

Step 3- Applying PCA to the arrays.

We will reduce the features by applying PCA to the arrays.

pca = PCA(n_components=2) pca.fit(X)

Step 4- Printing explained variance ratio.

We will print the explained variance ratio to better understand the model working.

print(pca.explained_variance_ratio_)

What Users are saying..

profile image

Ameeruddin Mohammed

ETL (Abintio) developer at IBM
linkedin profile url

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

Llama2 Project for MetaData Generation using FAISS and RAGs
In this LLM Llama2 Project, you will automate metadata generation using Llama2, RAGs, and AWS to reduce manual efforts.

Deep Learning Project for Text Detection in Images using Python
CV2 Text Detection Code for Images using Python -Build a CRNN deep learning model to predict the single-line text in a given image.

Build Regression Models in Python for House Price Prediction
In this Machine Learning Regression project, you will build and evaluate various regression models in Python for house price prediction.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Build Time Series Models for Gaussian Processes in Python
Time Series Project - A hands-on approach to Gaussian Processes for Time Series Modelling in Python

Learn to Build an End-to-End Machine Learning Pipeline - Part 1
In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, addressing a major challenge in the logistics industry.

Build an End-to-End AWS SageMaker Classification Model
MLOps on AWS SageMaker -Learn to Build an End-to-End Classification Model on SageMaker to predict a patient’s cause of death.

Model Deployment on GCP using Streamlit for Resume Parsing
Perform model deployment on GCP for resume parsing model using Streamlit App.

OpenCV Project for Beginners to Learn Computer Vision Basics
In this OpenCV project, you will learn computer vision basics and the fundamentals of OpenCV library using Python.