How to do variance thresholding in Python for feature selection?

This recipe helps you do variance thresholding in Python for feature selection

Recipe Objective

To increse the score of the model we need the dataset that has high variance, so it will be good if we can select the features in the dataset which has variance more than a fix threshold.

This data science python source code does the following:
1. Uses Variance for selecting the best features.
2. Visualizes the final result

So this is the recipe on how we can do variance thresholding in Python for feature selection.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Step 1 - Import the library

from sklearn import datasets from sklearn.feature_selection import VarianceThreshold

We have only imported datasets to import the inbult dataset and VarienceThreshold.

Step 2 - Setting up the Data

We have imported inbuilt iris dataset and stored data in X and target in y. We have also used print statement to print first 8 rows of the dataset. iris = datasets.load_iris() X = iris.data print(X[0:7]) y = iris.target print(y[0:7])

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 3 - Applying threshold on Variance

We have created an object for VarianceThreshold with parameter threshold in which we have to put the minimum value of variance we want in out dataset. Then we have used fit_transform to fit and transform the dataset. Finally we have printed the final dataset. thresholder = VarianceThreshold(threshold=.5) X_high_variance = thresholder.fit_transform(X) print(X_high_variance[0:7]) So in the output we can see that in final dataset we have 3 columns and in the initial dataset we have 4 columns which means the function have removed a column which has less variance.

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]]

[0 0 0 0 0 0 0]

[[5.1 1.4 0.2]
 [4.9 1.4 0.2]
 [4.7 1.3 0.2]
 [4.6 1.5 0.2]
 [5.  1.4 0.2]
 [5.4 1.7 0.4]
 [4.6 1.4 0.3]]

Download Materials

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Classification Projects on Machine Learning for Beginners - 1
Classification ML Project for Beginners - A Hands-On Approach to Implementing Different Types of Classification Algorithms in Machine Learning for Predictive Modelling

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Time Series Python Project using Greykite and Neural Prophet
In this time series project, you will forecast Walmart sales over time using the powerful, fast, and flexible time series forecasting library Greykite that helps automate time series problems.

Insurance Pricing Forecast Using XGBoost Regressor
In this project, we are going to talk about insurance forecast by using linear and xgboost regression techniques.

Build a Review Classification Model using Gated Recurrent Unit
In this Machine Learning project, you will build a classification model in python to classify the reviews of an app on a scale of 1 to 5 using Gated Recurrent Unit.

Deploying Machine Learning Models with Flask for Beginners
In this MLOps on GCP project you will learn to deploy a sales forecasting ML Model using Flask.

Build CNN Image Classification Models for Real Time Prediction
Image Classification Project to build a CNN model in Python that can classify images into social security cards, driving licenses, and other key identity information.

Learn to Build a Neural network from Scratch using NumPy
In this deep learning project, you will learn to build a neural network from scratch using NumPy

PyTorch Project to Build a GAN Model on MNIST Dataset
In this deep learning project, you will learn how to build a GAN Model on MNIST Dataset for generating new images of handwritten digits.

MLOps using Azure Devops to Deploy a Classification Model
In this MLOps Azure project, you will learn how to deploy a classification machine learning model to predict the customer's license status on Azure through scalable CI/CD ML pipelines.