DATA MUNGING
DATA CLEANING PYTHON
MACHINE LEARNING RECIPES
PANDAS CHEATSHEET
ALL TAGS
# How to reduce dimentionality using PCA in Python?

# How to reduce dimentionality using PCA in Python?

This recipe helps you reduce dimentionality using PCA in Python

In many datasets we find that number of features are very large and if we want to train the model it take more computational cost. To decrease the number of features we can use Principal component analysis (PCA). PCA decrease the number of features by selecting dimension of features which have most of the variance.

So this recipe is a short example of how can reduce dimentionality using PCA in Python.

```
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
```

Here we have imported various modules like PCA, datasets and StandardScale from differnt libraries. We will understand the use of these later while using it in the in the code snipet.

For now just have a look on these imports.

Here we have used datasets to load the inbuilt digits dataset.
```
digits = datasets.load_digits()
```

StandardScaler is used to remove the outliners and scale the data by making the mean of the data 0 and standard deviation as 1.
```
X = StandardScaler().fit_transform(digits.data)
print(); print(X)
```

We are also using Principal Component Analysis(PCA) which will reduce the dimension of features by creating new features which have most of the varience of the original data. We have passed the parameter n_components as 0.85 which is the percentage of feature in final dataset. We have also printed shape of intial and final dataset.
```
pca = PCA(n_components=0.85, whiten=True)
X_pca = pca.fit_transform(X)
print(X_pca)
print("Original number of features:", X.shape[1])
print("Reduced number of features:", X_pca.shape[1])
```

Foe better understanding we are applying PCA again. Now We have passed the parameter n_components as 0.85 which is the percentage of feature in final dataset. We have also printed shape of intial and final dataset.
```
pca = PCA(n_components=2, whiten=True)
X_pca = pca.fit_transform(X)
print(X_pca)
print("Original number of features:", X.shape[1])
print("Reduced number of features:", X_pca.shape[1])
```

As an output we get:

[[ 0. -0.33501649 -0.04308102 ... -1.14664746 -0.5056698 -0.19600752] [ 0. -0.33501649 -1.09493684 ... 0.54856067 -0.5056698 -0.19600752] [ 0. -0.33501649 -1.09493684 ... 1.56568555 1.6951369 -0.19600752] ... [ 0. -0.33501649 -0.88456568 ... -0.12952258 -0.5056698 -0.19600752] [ 0. -0.33501649 -0.67419451 ... 0.8876023 -0.5056698 -0.19600752] [ 0. -0.33501649 1.00877481 ... 0.8876023 -0.26113572 -0.19600752]] [[ 0.70631939 -0.39512814 -1.73816236 ... 0.60320435 -0.94455291 -0.60204272] [ 0.21732591 0.38276482 1.72878893 ... -0.56722002 0.61131544 1.02457999] [ 0.4804351 -0.13130437 1.33172761 ... -1.51284419 -0.48470912 -0.52826811] ... [ 0.37732433 -0.0612296 1.0879821 ... 0.04925597 0.29271531 -0.33891255] [ 0.39705007 -0.15768102 -1.08160094 ... 1.31785641 0.38883981 -1.21854835] [-0.46407544 -0.92213976 0.12493334 ... -1.27242756 -0.34190284 -1.17852306]] Original number of features: 64 Reduced number of features: 25 [[ 0.70634542 -0.39504744] [ 0.21730901 0.38270788] [ 0.48044955 -0.13126596] ... [ 0.37733004 -0.06120936] [ 0.39703595 -0.15774013] [-0.46406594 -0.92210953]] Original number of features: 64 Reduced number of features: 2

In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Datasetâ€‹ using Keras in Python.

There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.