How to select features using chi squared in Python?
FEATURE EXTRACTION DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to select features using chi squared in Python?

How to select features using chi squared in Python?

This recipe helps you select features using chi squared in Python

0

Recipe Objective

To increse the score of the model we need the dataset that has high chi-squared statistics, so it will be good if we can select the features in the dataset which has high chi-squared statistics.

This data science python source code does the following:
1.Selects features using Chi-Squared method
2. Selects the best features
3. Optimizes the final prediction results

So this is the recipe on how we can select features using chi-squared in python.

Step 1 - Import the library

from sklearn import datasets from sklearn.feature_selection import SelectKBest from sklearn.feature_selection import chi2

We have only imported datasets to import the datasets, SelectKBest and chi2.

Step 2 - Setting up the Data

We have imported inbuilt wine dataset and stored data in X and target in y. We have also used print statement to print rows of the dataset. wine = datasets.load_wine() X = wine.data print(X) y = wine.target print(y)

Step 3 - Selecting Features With high chi-square

We have used SelectKBest to select the features with best chi-square, we have passed two parameters one is the scoring metric that is chi2 and other is the value of K which signifies the number of features we want in final dataset.

We have used fit_transform to fit and transfrom the current dataset into the desired dataset. Finally we have printed the final dataset and the shape of initial and final dataset. chi2_selector = SelectKBest(chi2, k=2) X_kbest = chi2_selector.fit_transform(X, y) print(X_kbest) print('Original number of features:', X.shape) print('Reduced number of features:', X_kbest.shape) So the output comes as

[[1.423e+01 1.710e+00 2.430e+00 ... 1.040e+00 3.920e+00 1.065e+03]
 [1.320e+01 1.780e+00 2.140e+00 ... 1.050e+00 3.400e+00 1.050e+03]
 [1.316e+01 2.360e+00 2.670e+00 ... 1.030e+00 3.170e+00 1.185e+03]
 ...
 [1.327e+01 4.280e+00 2.260e+00 ... 5.900e-01 1.560e+00 8.350e+02]
 [1.317e+01 2.590e+00 2.370e+00 ... 6.000e-01 1.620e+00 8.400e+02]
 [1.413e+01 4.100e+00 2.740e+00 ... 6.100e-01 1.600e+00 5.600e+02]]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]

[[5.640000e+00 1.065000e+03]
 [4.380000e+00 1.050000e+03]
 [5.680000e+00 1.185000e+03]
 [7.800000e+00 1.480000e+03]
 [4.320000e+00 7.350000e+02]
 [6.750000e+00 1.450000e+03]
 [5.250000e+00 1.290000e+03]
 [5.050000e+00 1.295000e+03]
 [5.200000e+00 1.045000e+03]
 [7.220000e+00 1.045000e+03]
 [5.750000e+00 1.510000e+03]
 [5.000000e+00 1.280000e+03]
 [5.600000e+00 1.320000e+03]
 [5.400000e+00 1.150000e+03]
 [7.500000e+00 1.547000e+03]
 [7.300000e+00 1.310000e+03]
 [6.200000e+00 1.280000e+03]
 [6.600000e+00 1.130000e+03]
 [8.700000e+00 1.680000e+03]
 [5.100000e+00 8.450000e+02]
 [5.650000e+00 7.800000e+02]
 [4.500000e+00 7.700000e+02]
 [3.800000e+00 1.035000e+03]
 [3.930000e+00 1.015000e+03]
 [3.520000e+00 8.450000e+02]
 [3.580000e+00 8.300000e+02]
 [4.800000e+00 1.195000e+03]
 [3.950000e+00 1.285000e+03]
 [4.500000e+00 9.150000e+02]
 [4.700000e+00 1.035000e+03]
 [5.700000e+00 1.285000e+03]
 [6.900000e+00 1.515000e+03]
 [3.840000e+00 9.900000e+02]
 [5.400000e+00 1.235000e+03]
 [4.200000e+00 1.095000e+03]
 [5.100000e+00 9.200000e+02]
 [4.600000e+00 8.800000e+02]
 [4.250000e+00 1.105000e+03]
 [3.700000e+00 1.020000e+03]
 [5.100000e+00 7.600000e+02]
 [6.130000e+00 7.950000e+02]
 [4.280000e+00 1.035000e+03]
 [5.430000e+00 1.095000e+03]
 [4.360000e+00 6.800000e+02]
 [5.040000e+00 8.850000e+02]
 [5.240000e+00 1.080000e+03]
 [4.900000e+00 1.065000e+03]
 [6.100000e+00 9.850000e+02]
 [6.200000e+00 1.060000e+03]
 [8.900000e+00 1.260000e+03]
 [7.200000e+00 1.150000e+03]
 [5.600000e+00 1.265000e+03]
 [7.050000e+00 1.190000e+03]
 [6.300000e+00 1.375000e+03]
 [5.850000e+00 1.060000e+03]
 [6.250000e+00 1.120000e+03]
 [6.380000e+00 9.700000e+02]
 [6.000000e+00 1.270000e+03]
 [6.800000e+00 1.285000e+03]
 [1.950000e+00 5.200000e+02]
 [3.270000e+00 6.800000e+02]
 [5.750000e+00 4.500000e+02]
 [3.800000e+00 6.300000e+02]
 [4.450000e+00 4.200000e+02]
 [2.950000e+00 3.550000e+02]
 [4.600000e+00 6.780000e+02]
 [5.300000e+00 5.020000e+02]
 [4.680000e+00 5.100000e+02]
 [3.170000e+00 7.500000e+02]
 [2.850000e+00 7.180000e+02]
 [3.050000e+00 8.700000e+02]
 [3.380000e+00 4.100000e+02]
 [3.740000e+00 4.720000e+02]
 [3.350000e+00 9.850000e+02]
 [3.210000e+00 8.860000e+02]
 [3.800000e+00 4.280000e+02]
 [4.600000e+00 3.920000e+02]
 [2.650000e+00 5.000000e+02]
 [3.400000e+00 7.500000e+02]
 [2.570000e+00 4.630000e+02]
 [2.500000e+00 2.780000e+02]
 [3.900000e+00 7.140000e+02]
 [2.200000e+00 6.300000e+02]
 [4.800000e+00 5.150000e+02]
 [3.050000e+00 5.200000e+02]
 [2.620000e+00 4.500000e+02]
 [2.450000e+00 4.950000e+02]
 [2.600000e+00 5.620000e+02]
 [2.800000e+00 6.800000e+02]
 [1.740000e+00 6.250000e+02]
 [2.400000e+00 4.800000e+02]
 [3.600000e+00 4.500000e+02]
 [3.050000e+00 4.950000e+02]
 [2.150000e+00 2.900000e+02]
 [3.250000e+00 3.450000e+02]
 [2.600000e+00 9.370000e+02]
 [2.500000e+00 6.250000e+02]
 [2.900000e+00 4.280000e+02]
 [4.500000e+00 6.600000e+02]
 [2.300000e+00 4.060000e+02]
 [3.300000e+00 7.100000e+02]
 [2.450000e+00 5.620000e+02]
 [2.800000e+00 4.380000e+02]
 [2.060000e+00 4.150000e+02]
 [2.940000e+00 6.720000e+02]
 [2.700000e+00 3.150000e+02]
 [3.400000e+00 5.100000e+02]
 [3.300000e+00 4.880000e+02]
 [2.700000e+00 3.120000e+02]
 [2.650000e+00 6.800000e+02]
 [2.900000e+00 5.620000e+02]
 [2.000000e+00 3.250000e+02]
 [3.800000e+00 6.070000e+02]
 [3.080000e+00 4.340000e+02]
 [2.900000e+00 3.850000e+02]
 [1.900000e+00 4.070000e+02]
 [1.950000e+00 4.950000e+02]
 [2.060000e+00 3.450000e+02]
 [3.400000e+00 3.720000e+02]
 [1.280000e+00 5.640000e+02]
 [3.250000e+00 6.250000e+02]
 [6.000000e+00 4.650000e+02]
 [2.080000e+00 3.650000e+02]
 [2.600000e+00 3.800000e+02]
 [2.800000e+00 3.800000e+02]
 [2.760000e+00 3.780000e+02]
 [3.940000e+00 3.520000e+02]
 [3.000000e+00 4.660000e+02]
 [2.120000e+00 3.420000e+02]
 [2.600000e+00 5.800000e+02]
 [4.100000e+00 6.300000e+02]
 [5.400000e+00 5.300000e+02]
 [5.700000e+00 5.600000e+02]
 [5.000000e+00 6.000000e+02]
 [5.450000e+00 6.500000e+02]
 [7.100000e+00 6.950000e+02]
 [3.850000e+00 7.200000e+02]
 [5.000000e+00 5.150000e+02]
 [5.700000e+00 5.800000e+02]
 [4.920000e+00 5.900000e+02]
 [4.600000e+00 6.000000e+02]
 [5.600000e+00 7.800000e+02]
 [4.350000e+00 5.200000e+02]
 [4.400000e+00 5.500000e+02]
 [8.210000e+00 8.550000e+02]
 [4.000000e+00 8.300000e+02]
 [4.900000e+00 4.150000e+02]
 [7.650000e+00 6.250000e+02]
 [8.420000e+00 6.500000e+02]
 [9.400000e+00 5.500000e+02]
 [8.600000e+00 5.000000e+02]
 [1.080000e+01 4.800000e+02]
 [7.100000e+00 4.250000e+02]
 [1.052000e+01 6.750000e+02]
 [7.600000e+00 6.400000e+02]
 [7.900000e+00 7.250000e+02]
 [9.010000e+00 4.800000e+02]
 [7.500000e+00 8.800000e+02]
 [1.300000e+01 6.600000e+02]
 [1.175000e+01 6.200000e+02]
 [7.650000e+00 5.200000e+02]
 [5.880000e+00 6.800000e+02]
 [5.580000e+00 5.700000e+02]
 [5.280000e+00 6.750000e+02]
 [9.580000e+00 6.150000e+02]
 [6.620000e+00 5.200000e+02]
 [1.068000e+01 6.950000e+02]
 [1.026000e+01 6.850000e+02]
 [8.660000e+00 7.500000e+02]
 [8.500000e+00 6.300000e+02]
 [5.500000e+00 5.100000e+02]
 [9.899999e+00 4.700000e+02]
 [9.700000e+00 6.600000e+02]
 [7.700000e+00 7.400000e+02]
 [7.300000e+00 7.500000e+02]
 [1.020000e+01 8.350000e+02]
 [9.300000e+00 8.400000e+02]
 [9.200000e+00 5.600000e+02]]

Original number of features: (178, 13)
Reduced number of features: (178, 2)

Relevant Projects

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.