How to do variance thresholding in Python for feature selection?

How to do variance thresholding in Python for feature selection?

How to do variance thresholding in Python for feature selection?

This recipe helps you do variance thresholding in Python for feature selection


Recipe Objective

To increse the score of the model we need the dataset that has high variance, so it will be good if we can select the features in the dataset which has variance more than a fix threshold.

This data science python source code does the following:
1. Uses Variance for selecting the best features.
2. Visualizes the final result

So this is the recipe on how we can do variance thresholding in Python for feature selection.

Step 1 - Import the library

from sklearn import datasets from sklearn.feature_selection import VarianceThreshold

We have only imported datasets to import the inbult dataset and VarienceThreshold.

Step 2 - Setting up the Data

We have imported inbuilt iris dataset and stored data in X and target in y. We have also used print statement to print first 8 rows of the dataset. iris = datasets.load_iris() X = print(X[0:7]) y = print(y[0:7])

Step 3 - Applying threshold on Variance

We have created an object for VarianceThreshold with parameter threshold in which we have to put the minimum value of variance we want in out dataset. Then we have used fit_transform to fit and transform the dataset. Finally we have printed the final dataset. thresholder = VarianceThreshold(threshold=.5) X_high_variance = thresholder.fit_transform(X) print(X_high_variance[0:7]) So in the output we can see that in final dataset we have 3 columns and in the initial dataset we have 4 columns which means the function have removed a column which has less variance.

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]]

[0 0 0 0 0 0 0]

[[5.1 1.4 0.2]
 [4.9 1.4 0.2]
 [4.7 1.3 0.2]
 [4.6 1.5 0.2]
 [5.  1.4 0.2]
 [5.4 1.7 0.4]
 [4.6 1.4 0.3]]

Relevant Projects

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Human Activity Recognition Using Multiclass Classification in Python
In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.