How to do variance thresholding in Python for feature selection?

How to do variance thresholding in Python for feature selection?

How to do variance thresholding in Python for feature selection?

This recipe helps you do variance thresholding in Python for feature selection


Recipe Objective

To increse the score of the model we need the dataset that has high variance, so it will be good if we can select the features in the dataset which has variance more than a fix threshold.

This data science python source code does the following:
1. Uses Variance for selecting the best features.
2. Visualizes the final result

So this is the recipe on how we can do variance thresholding in Python for feature selection.

Step 1 - Import the library

from sklearn import datasets from sklearn.feature_selection import VarianceThreshold

We have only imported datasets to import the inbult dataset and VarienceThreshold.

Step 2 - Setting up the Data

We have imported inbuilt iris dataset and stored data in X and target in y. We have also used print statement to print first 8 rows of the dataset. iris = datasets.load_iris() X = print(X[0:7]) y = print(y[0:7])

Step 3 - Applying threshold on Variance

We have created an object for VarianceThreshold with parameter threshold in which we have to put the minimum value of variance we want in out dataset. Then we have used fit_transform to fit and transform the dataset. Finally we have printed the final dataset. thresholder = VarianceThreshold(threshold=.5) X_high_variance = thresholder.fit_transform(X) print(X_high_variance[0:7]) So in the output we can see that in final dataset we have 3 columns and in the initial dataset we have 4 columns which means the function have removed a column which has less variance.

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]]

[0 0 0 0 0 0 0]

[[5.1 1.4 0.2]
 [4.9 1.4 0.2]
 [4.7 1.3 0.2]
 [4.6 1.5 0.2]
 [5.  1.4 0.2]
 [5.4 1.7 0.4]
 [4.6 1.4 0.3]]

Relevant Projects

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.