How to reduce dimentionality using PCA in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

# How to reduce dimentionality using PCA in Python?

This recipe helps you reduce dimentionality using PCA in Python

## Recipe Objective

In many datasets we find that number of features are very large and if we want to train the model it take more computational cost. To decrease the number of features we can use Principal component analysis (PCA). PCA decrease the number of features by selecting dimension of features which have most of the variance.

So this recipe is a short example of how can reduce dimentionality using PCA in Python.

## Step 1 - Import the library

``` from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA ```

Here we have imported various modules like PCA, datasets and StandardScale from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

## Step 2 - Setup the Data

Here we have used datasets to load the inbuilt digits dataset. ``` digits = datasets.load_digits() ```

## Step 3 - Using StandardScaler

StandardScaler is used to remove the outliners and scale the data by making the mean of the data 0 and standard deviation as 1. ``` X = StandardScaler().fit_transform(digits.data) print(); print(X) ```

## Step 4 - Using PCA

We are also using Principal Component Analysis(PCA) which will reduce the dimension of features by creating new features which have most of the varience of the original data. We have passed the parameter n_components as 0.85 which is the percentage of feature in final dataset. We have also printed shape of intial and final dataset. ``` pca = PCA(n_components=0.85, whiten=True) X_pca = pca.fit_transform(X) print(X_pca) print("Original number of features:", X.shape[1]) print("Reduced number of features:", X_pca.shape[1]) ``` Foe better understanding we are applying PCA again. Now We have passed the parameter n_components as 0.85 which is the percentage of feature in final dataset. We have also printed shape of intial and final dataset. ``` pca = PCA(n_components=2, whiten=True) X_pca = pca.fit_transform(X) print(X_pca) print("Original number of features:", X.shape[1]) print("Reduced number of features:", X_pca.shape[1]) ``` As an output we get:

```[[ 0.         -0.33501649 -0.04308102 ... -1.14664746 -0.5056698
-0.19600752]
[ 0.         -0.33501649 -1.09493684 ...  0.54856067 -0.5056698
-0.19600752]
[ 0.         -0.33501649 -1.09493684 ...  1.56568555  1.6951369
-0.19600752]
...
[ 0.         -0.33501649 -0.88456568 ... -0.12952258 -0.5056698
-0.19600752]
[ 0.         -0.33501649 -0.67419451 ...  0.8876023  -0.5056698
-0.19600752]
[ 0.         -0.33501649  1.00877481 ...  0.8876023  -0.26113572
-0.19600752]]

[[ 0.70631939 -0.39512814 -1.73816236 ...  0.60320435 -0.94455291
-0.60204272]
[ 0.21732591  0.38276482  1.72878893 ... -0.56722002  0.61131544
1.02457999]
[ 0.4804351  -0.13130437  1.33172761 ... -1.51284419 -0.48470912
-0.52826811]
...
[ 0.37732433 -0.0612296   1.0879821  ...  0.04925597  0.29271531
-0.33891255]
[ 0.39705007 -0.15768102 -1.08160094 ...  1.31785641  0.38883981
-1.21854835]
[-0.46407544 -0.92213976  0.12493334 ... -1.27242756 -0.34190284
-1.17852306]]
Original number of features: 64
Reduced number of features: 25

[[ 0.70634542 -0.39504744]
[ 0.21730901  0.38270788]
[ 0.48044955 -0.13126596]
...
[ 0.37733004 -0.06120936]
[ 0.39703595 -0.15774013]
[-0.46406594 -0.92210953]]
Original number of features: 64
Reduced number of features: 2
```

#### Relevant Projects

##### Word2Vec and FastText Word Embedding with Gensim in Python
In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

##### House Price Prediction Project using Machine Learning
Use the Zillow dataset to follow a test-driven approach and build a regression machine learning model to predict the price of the house based on other variables.

##### Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

##### Image Segmentation using Mask R-CNN with Tensorflow
In this Deep Learning Project on Image Segmentation Python, you will learn how to implement the Mask R-CNN model for early fire detection.

##### Digit Recognition using CNN for MNIST Dataset in Python
In this deep learning project, you will build a convolutional neural network using MNIST dataset for handwritten digit recognition.

##### Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

##### Avocado Machine Learning Project Python for Price Prediction
In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.

##### Build a Face Recognition System in Python using FaceNet
In this deep learning project, you will build your own face recognition system in Python using OpenCV and FaceNet by extracting features from an image of a person's face.

##### Census Income Data Set Project - Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based on census data.

##### Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.