How to do variance thresholding in Python for feature selection?

This recipe helps you do variance thresholding in Python for feature selection
Last Updated: 22 Dec 2022

Get access to Data Science projects View all Data Science projects

FEATURE EXTRACTION DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

To increse the score of the model we need the dataset that has high variance, so it will be good if we can select the features in the dataset which has variance more than a fix threshold.

This data science python source code does the following:
1. Uses Variance for selecting the best features.
2. Visualizes the final result

So this is the recipe on how we can do variance thresholding in Python for feature selection.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Recipe Objective

Step 1 - Import the library

from sklearn import datasets from sklearn.feature_selection import VarianceThreshold

We have only imported datasets to import the inbult dataset and VarienceThreshold.

Step 2 - Setting up the Data

We have imported inbuilt iris dataset and stored data in X and target in y. We have also used print statement to print first 8 rows of the dataset. iris = datasets.load_iris() X = iris.data print(X[0:7]) y = iris.target print(y[0:7])

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 3 - Applying threshold on Variance

We have created an object for VarianceThreshold with parameter threshold in which we have to put the minimum value of variance we want in out dataset. Then we have used fit_transform to fit and transform the dataset. Finally we have printed the final dataset. thresholder = VarianceThreshold(threshold=.5) X_high_variance = thresholder.fit_transform(X) print(X_high_variance[0:7]) So in the output we can see that in final dataset we have 3 columns and in the initial dataset we have 4 columns which means the function have removed a column which has less variance.

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]]

[0 0 0 0 0 0 0]

[[5.1 1.4 0.2]
 [4.9 1.4 0.2]
 [4.7 1.3 0.2]
 [4.6 1.5 0.2]
 [5.  1.4 0.2]
 [5.4 1.7 0.4]
 [4.6 1.4 0.3]]

Download Materials

iPython Notebook

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Classification Projects on Machine Learning for Beginners - 1

Classification ML Project for Beginners - A Hands-On Approach to Implementing Different Types of Classification Algorithms in Machine Learning for Predictive Modelling

View Project Details

Topic modelling using Kmeans clustering to group customer reviews

In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

View Project Details

Time Series Python Project using Greykite and Neural Prophet

In this time series project, you will forecast Walmart sales over time using the powerful, fast, and flexible time series forecasting library Greykite that helps automate time series problems.

View Project Details

Insurance Pricing Forecast Using XGBoost Regressor

In this project, we are going to talk about insurance forecast by using linear and xgboost regression techniques.

View Project Details

Build a Review Classification Model using Gated Recurrent Unit

In this Machine Learning project, you will build a classification model in python to classify the reviews of an app on a scale of 1 to 5 using Gated Recurrent Unit.

View Project Details

Deploying Machine Learning Models with Flask for Beginners

In this MLOps on GCP project you will learn to deploy a sales forecasting ML Model using Flask.

View Project Details

Build CNN Image Classification Models for Real Time Prediction

Image Classification Project to build a CNN model in Python that can classify images into social security cards, driving licenses, and other key identity information.

View Project Details

Learn to Build a Neural network from Scratch using NumPy

In this deep learning project, you will learn to build a neural network from scratch using NumPy

View Project Details

PyTorch Project to Build a GAN Model on MNIST Dataset

In this deep learning project, you will learn how to build a GAN Model on MNIST Dataset for generating new images of handwritten digits.

View Project Details

MLOps using Azure Devops to Deploy a Classification Model

In this MLOps Azure project, you will learn how to deploy a classification machine learning model to predict the customer's license status on Azure through scalable CI/CD ML pipelines.

View Project Details

How to do variance thresholding in Python for feature selection?

Recipe Objective

Table of Contents

Step 1 - Import the library

Step 2 - Setting up the Data

Step 3 - Applying threshold on Variance

Ray han

Relevant Projects

You might also like

Relevant Projects