What is Principal Component Analysis in the StatsModels library?

The following recipe explains what is Principal Component Analysis in the StatsModels library.

Recipe Objective - What is Principal Component Analysis in the StatsModels library?

PCA is Principal Component Analysis. It belongs to the class statsmodels.multivariate.pca.PCA(data, ncomp=None, standardize=True, demean=True, normalize=True, gls=False, weights=None, method='svd', missing=None, tol=5e-08, max_iter=1000, tol_em=5e-08, max_em_iter=100, svd_full_matrices=False)

For more related projects -

https://www.projectpro.io/projects/data-science-projects/deep-learning-projects
https://www.projectpro.io/projects/data-science-projects/deep-learning-projects

Parameters:

data
Variables in columns, observations in rows.

ncomp
Number of components to return. If None, returns the as many as the smaller of the number of rows or columns in data.

standardize
Flag indicating to use standardized data with mean 0 and unit variance. standardized being True implies demean. Using standardized data is equivalent to computing principal components from the correlation matrix of data.

demean
Flag indicating whether to demean data before computing principal components. demean is ignored if standardize True. Demeaning data but not standardizing is equivalent to computing principal components from the covariance matrix of data.

normalize
Indicates whether to normalize the factors to have a unit inner product. If False, the loadings will have a unit inner product.

Instacart Market Basket Analysis in Python 

gls
Flag indicating to implement a two-step GLS estimator wherein the first step principal components are used to estimate residuals, and then the inverse residual variance is used as a set of weights to estimate the final principal components. Setting gls to True requires ncomp to be less then the min of the number of rows or columns.

weights
Series weights to use after transforming data according to standardize or demean when computing the principal components.

method
1. ‘svd’ uses a singular value decomposition (default).

2. ‘eig’ uses an eigenvalue decomposition of a quadratic form

3. ‘nipals’ uses the NIPALS algorithm and can be faster than SVD when ncomp is small and nvars is large. See notes about additional changes when using NIPALS.

Attributes:

factors[array or DataFrame]
nobs by ncomp array of principal components (scores)

scores[array or DataFrame]
nobs by ncomp array of principal components - identical to factors


Example:

# Example 1:
# Importing libraries
import numpy as np
from statsmodels.multivariate.pca import PCA

# Creating array of random numbers
data = np.random.randn(10)

# Fitting pca model
pca_model = PCA(data)

# Factors
pca_model.factors

Output - 
array([[-0.14246123],
       [ 0.3902405 ],
       [ 0.18353067],
       [ 0.30667022],
       [-0.56520834],
       [ 0.4737978 ],
       [-0.2789227 ],
       [-0.26372694],
       [-0.01327701],
       [-0.09064296]])

# Example 2:
# Importing libraries
import numpy as np
from statsmodels.multivariate.pca import PCA

# Creating array of random numbers
data = np.random.randn(10)

# Fitting pca model
pca_model = PCA(data, method='eig')

# Factors
pca_model.factors

Output - 
array([[-0.54885266],
       [-0.04136097],
       [ 0.20260935],
       [ 0.16259255],
       [-0.28626099],
       [ 0.37394827],
       [ 0.38848118],
       [-0.12744043],
       [ 0.27944004],
       [-0.40315635]])

In this way, we can perform PCA in StatsModel library.

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Tensorflow Transfer Learning Model for Image Classification
Image Classification Project - Build an Image Classification Model on a Dataset of T-Shirt Images for Binary Classification

Build Multi Class Text Classification Models with RNN and LSTM
In this Deep Learning Project, you will use the customer complaints data about consumer financial products to build multi-class text classification models using RNN and LSTM.

Linear Regression Model Project in Python for Beginners Part 2
Machine Learning Linear Regression Project for Beginners in Python to Build a Multiple Linear Regression Model on Soccer Player Dataset.

Learn How to Build PyTorch Neural Networks from Scratch
In this deep learning project, you will learn how to build PyTorch neural networks from scratch.

Locality Sensitive Hashing Python Code for Look-Alike Modelling
In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.

Expedia Hotel Recommendations Data Science Project
In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

Learn to Build a Polynomial Regression Model from Scratch
In this Machine Learning Regression project, you will learn to build a polynomial regression model to predict points scored by the sports team.

Build OCR from Scratch Python using YOLO and Tesseract
In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

Multilabel Classification Project for Predicting Shipment Modes
Multilabel Classification Project to build a machine learning model that predicts the appropriate mode of transport for each shipment, using a transport dataset with 2000 unique products. The project explores and compares four different approaches to multilabel classification, including naive independent models, classifier chains, natively multilabel models, and multilabel to multiclass approaches.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.