What is Principal Component Analysis in the StatsModels library?

The following recipe explains what is Principal Component Analysis in the StatsModels library.
Last Updated: 15 Sep 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING PROJECTS IN PYTHON DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective - What is Principal Component Analysis in the StatsModels library?

PCA is Principal Component Analysis. It belongs to the class statsmodels.multivariate.pca.PCA(data, ncomp=None, standardize=True, demean=True, normalize=True, gls=False, weights=None, method='svd', missing=None, tol=5e-08, max_iter=1000, tol_em=5e-08, max_em_iter=100, svd_full_matrices=False)

For more related projects -

https://www.projectpro.io/projects/data-science-projects/deep-learning-projects
https://www.projectpro.io/projects/data-science-projects/deep-learning-projects

Parameters:

data
Variables in columns, observations in rows.

ncomp
Number of components to return. If None, returns the as many as the smaller of the number of rows or columns in data.

standardize
Flag indicating to use standardized data with mean 0 and unit variance. standardized being True implies demean. Using standardized data is equivalent to computing principal components from the correlation matrix of data.

demean
Flag indicating whether to demean data before computing principal components. demean is ignored if standardize True. Demeaning data but not standardizing is equivalent to computing principal components from the covariance matrix of data.

normalize
Indicates whether to normalize the factors to have a unit inner product. If False, the loadings will have a unit inner product.

Instacart Market Basket Analysis in Python

gls
Flag indicating to implement a two-step GLS estimator wherein the first step principal components are used to estimate residuals, and then the inverse residual variance is used as a set of weights to estimate the final principal components. Setting gls to True requires ncomp to be less then the min of the number of rows or columns.

weights
Series weights to use after transforming data according to standardize or demean when computing the principal components.

method
1. ‘svd’ uses a singular value decomposition (default).

2. ‘eig’ uses an eigenvalue decomposition of a quadratic form

3. ‘nipals’ uses the NIPALS algorithm and can be faster than SVD when ncomp is small and nvars is large. See notes about additional changes when using NIPALS.

Attributes:

factors[array or DataFrame]
nobs by ncomp array of principal components (scores)

scores[array or DataFrame]
nobs by ncomp array of principal components - identical to factors

Example:

# Example 1: # Importing libraries import numpy as np from statsmodels.multivariate.pca import PCA # Creating array of random numbers data = np.random.randn(10) # Fitting pca model pca_model = PCA(data) # Factors pca_model.factors

Output - 
array([[-0.14246123],
       [ 0.3902405 ],
       [ 0.18353067],
       [ 0.30667022],
       [-0.56520834],
       [ 0.4737978 ],
       [-0.2789227 ],
       [-0.26372694],
       [-0.01327701],
       [-0.09064296]])

# Example 2: # Importing libraries import numpy as np from statsmodels.multivariate.pca import PCA # Creating array of random numbers data = np.random.randn(10) # Fitting pca model pca_model = PCA(data, method='eig') # Factors pca_model.factors

Output - 
array([[-0.54885266],
       [-0.04136097],
       [ 0.20260935],
       [ 0.16259255],
       [-0.28626099],
       [ 0.37394827],
       [ 0.38848118],
       [-0.12744043],
       [ 0.27944004],
       [-0.40315635]])

In this way, we can perform PCA in StatsModel library.

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Tensorflow Transfer Learning Model for Image Classification

Image Classification Project - Build an Image Classification Model on a Dataset of T-Shirt Images for Binary Classification

View Project Details

Build Multi Class Text Classification Models with RNN and LSTM

In this Deep Learning Project, you will use the customer complaints data about consumer financial products to build multi-class text classification models using RNN and LSTM.

View Project Details

Linear Regression Model Project in Python for Beginners Part 2

Machine Learning Linear Regression Project for Beginners in Python to Build a Multiple Linear Regression Model on Soccer Player Dataset.

View Project Details

Learn How to Build PyTorch Neural Networks from Scratch

In this deep learning project, you will learn how to build PyTorch neural networks from scratch.

View Project Details

Locality Sensitive Hashing Python Code for Look-Alike Modelling

In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.

View Project Details

Expedia Hotel Recommendations Data Science Project

In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

View Project Details

Learn to Build a Polynomial Regression Model from Scratch

In this Machine Learning Regression project, you will learn to build a polynomial regression model to predict points scored by the sports team.

View Project Details

Build OCR from Scratch Python using YOLO and Tesseract

In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

View Project Details

Multilabel Classification Project for Predicting Shipment Modes

Multilabel Classification Project to build a machine learning model that predicts the appropriate mode of transport for each shipment, using a transport dataset with 2000 unique products. The project explores and compares four different approaches to multilabel classification, including naive independent models, classifier chains, natively multilabel models, and multilabel to multiclass approaches.

View Project Details

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction

In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

View Project Details

What is Principal Component Analysis in the StatsModels library?

Recipe Objective - What is Principal Component Analysis in the StatsModels library?

Parameters:

Attributes:

Ray han

Relevant Projects

You might also like

Relevant Projects