How to extract features using PCA in Python?

How to extract features using PCA in Python?

How to extract features using PCA in Python?

This recipe helps you extract features using PCA in Python


Recipe Objective

In many datasets we find that number of features are very large and if we want to train the model it take more computational cost. To decrease the number of features we can use Principal component analysis (PCA). PCA decrease the number of features by selecting dimension of features which have most of the variance.

So this recipe is a short example of how can extract features using PCA in Python

Step 1 - Import the library

from sklearn import decomposition, datasets from sklearn.preprocessing import StandardScaler

Here we have imported various modules like decomposition, datasets and StandardScale from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Setup the Data

Here we have used datasets to load the inbuilt cancer dataset and we have created objects X and y to store the data and the target value respectively. dataset = datasets.load_breast_cancer() X = print(X.shape) print(X)

Step 3 - Using StandardScaler and PCA

StandardScaler is used to remove the outliners and scale the data by making the mean of the data 0 and standard deviation as 1. So we are creating an object std_scl to use standardScaler. std_slc = StandardScaler() X_std = std_slc.fit_transform(X) print(X_std.shape) print(X_std)

We are also using Principal Component Analysis(PCA) which will reduce the dimension of features by creating new features which have most of the varience of the original data. We have passed the parameter n_components as 4 which is the number of feature in final dataset. pca = decomposition.PCA(n_components=4) X_std_pca = pca.fit_transform(X_std) print(X_std_pca.shape) print(X_std_pca) As an output we get:

(569, 30)

[[1.799e+01 1.038e+01 1.228e+02 ... 2.654e-01 4.601e-01 1.189e-01]
 [2.057e+01 1.777e+01 1.329e+02 ... 1.860e-01 2.750e-01 8.902e-02]
 [1.969e+01 2.125e+01 1.300e+02 ... 2.430e-01 3.613e-01 8.758e-02]
 [1.660e+01 2.808e+01 1.083e+02 ... 1.418e-01 2.218e-01 7.820e-02]
 [2.060e+01 2.933e+01 1.401e+02 ... 2.650e-01 4.087e-01 1.240e-01]
 [7.760e+00 2.454e+01 4.792e+01 ... 0.000e+00 2.871e-01 7.039e-02]]

(569, 30)

[[ 1.09706398 -2.07333501  1.26993369 ...  2.29607613  2.75062224
 [ 1.82982061 -0.35363241  1.68595471 ...  1.0870843  -0.24388967
 [ 1.57988811  0.45618695  1.56650313 ...  1.95500035  1.152255
 [ 0.70228425  2.0455738   0.67267578 ...  0.41406869 -1.10454895
 [ 1.83834103  2.33645719  1.98252415 ...  2.28998549  1.91908301
 [-1.80840125  1.22179204 -1.81438851 ... -1.74506282 -0.04813821

(569, 4)

[[ 9.19283682  1.94858315 -1.12316659  3.63373524]
 [ 2.3878018  -3.76817178 -0.52929307  1.1182629 ]
 [ 5.73389628 -1.07517381 -0.55174687  0.91208083]
 [ 1.25617928 -1.90229673  0.56273054 -2.0892281 ]
 [10.37479406  1.67201009 -1.87702907 -2.35603254]
 [-5.4752433  -0.67063675  1.49044361 -2.29915639]]

Relevant Projects

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.