How to select features using chi squared in Python?

This recipe helps you select features using chi squared in Python

Recipe Objective

To increse the score of the model we need the dataset that has high chi-squared statistics, so it will be good if we can select the features in the dataset which has high chi-squared statistics.

This data science python source code does the following:
1.Selects features using Chi-Squared method
2. Selects the best features
3. Optimizes the final prediction results

So this is the recipe on how we can select features using chi-squared in python.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Step 1 - Import the library

from sklearn import datasets from sklearn.feature_selection import SelectKBest from sklearn.feature_selection import chi2

We have only imported datasets to import the datasets, SelectKBest and chi2.

Step 2 - Setting up the Data

We have imported inbuilt wine dataset and stored data in X and target in y. We have also used print statement to print rows of the dataset. wine = datasets.load_wine() X = print(X) y = print(y)

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 3 - Selecting Features With high chi-square

We have used SelectKBest to select the features with best chi-square, we have passed two parameters one is the scoring metric that is chi2 and other is the value of K which signifies the number of features we want in final dataset. We have used fit_transform to fit and transfrom the current dataset into the desired dataset. Finally we have printed the final dataset and the shape of initial and final dataset. chi2_selector = SelectKBest(chi2, k=2) X_kbest = chi2_selector.fit_transform(X, y) print(X_kbest) print('Original number of features:', X.shape) print('Reduced number of features:', X_kbest.shape) So the output comes as

[[1.423e+01 1.710e+00 2.430e+00 ... 1.040e+00 3.920e+00 1.065e+03]
 [1.320e+01 1.780e+00 2.140e+00 ... 1.050e+00 3.400e+00 1.050e+03]
 [1.316e+01 2.360e+00 2.670e+00 ... 1.030e+00 3.170e+00 1.185e+03]
 [1.327e+01 4.280e+00 2.260e+00 ... 5.900e-01 1.560e+00 8.350e+02]
 [1.317e+01 2.590e+00 2.370e+00 ... 6.000e-01 1.620e+00 8.400e+02]
 [1.413e+01 4.100e+00 2.740e+00 ... 6.100e-01 1.600e+00 5.600e+02]]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]

[[5.640000e+00 1.065000e+03]
 [4.380000e+00 1.050000e+03]
 [5.680000e+00 1.185000e+03]
 [7.800000e+00 1.480000e+03]
 [4.320000e+00 7.350000e+02]
 [6.750000e+00 1.450000e+03]
 [5.250000e+00 1.290000e+03]
 [5.050000e+00 1.295000e+03]
 [5.200000e+00 1.045000e+03]
 [7.220000e+00 1.045000e+03]
 [5.750000e+00 1.510000e+03]
 [5.000000e+00 1.280000e+03]
 [5.600000e+00 1.320000e+03]
 [5.400000e+00 1.150000e+03]
 [7.500000e+00 1.547000e+03]
 [7.300000e+00 1.310000e+03]
 [6.200000e+00 1.280000e+03]
 [6.600000e+00 1.130000e+03]
 [8.700000e+00 1.680000e+03]
 [5.100000e+00 8.450000e+02]
 [5.650000e+00 7.800000e+02]
 [4.500000e+00 7.700000e+02]
 [3.800000e+00 1.035000e+03]
 [3.930000e+00 1.015000e+03]
 [3.520000e+00 8.450000e+02]
 [3.580000e+00 8.300000e+02]
 [4.800000e+00 1.195000e+03]
 [3.950000e+00 1.285000e+03]
 [4.500000e+00 9.150000e+02]
 [4.700000e+00 1.035000e+03]
 [5.700000e+00 1.285000e+03]
 [6.900000e+00 1.515000e+03]
 [3.840000e+00 9.900000e+02]
 [5.400000e+00 1.235000e+03]
 [4.200000e+00 1.095000e+03]
 [5.100000e+00 9.200000e+02]
 [4.600000e+00 8.800000e+02]
 [4.250000e+00 1.105000e+03]
 [3.700000e+00 1.020000e+03]
 [5.100000e+00 7.600000e+02]
 [6.130000e+00 7.950000e+02]
 [4.280000e+00 1.035000e+03]
 [5.430000e+00 1.095000e+03]
 [4.360000e+00 6.800000e+02]
 [5.040000e+00 8.850000e+02]
 [5.240000e+00 1.080000e+03]
 [4.900000e+00 1.065000e+03]
 [6.100000e+00 9.850000e+02]
 [6.200000e+00 1.060000e+03]
 [8.900000e+00 1.260000e+03]
 [7.200000e+00 1.150000e+03]
 [5.600000e+00 1.265000e+03]
 [7.050000e+00 1.190000e+03]
 [6.300000e+00 1.375000e+03]
 [5.850000e+00 1.060000e+03]
 [6.250000e+00 1.120000e+03]
 [6.380000e+00 9.700000e+02]
 [6.000000e+00 1.270000e+03]
 [6.800000e+00 1.285000e+03]
 [1.950000e+00 5.200000e+02]
 [3.270000e+00 6.800000e+02]
 [5.750000e+00 4.500000e+02]
 [3.800000e+00 6.300000e+02]
 [4.450000e+00 4.200000e+02]
 [2.950000e+00 3.550000e+02]
 [4.600000e+00 6.780000e+02]
 [5.300000e+00 5.020000e+02]
 [4.680000e+00 5.100000e+02]
 [3.170000e+00 7.500000e+02]
 [2.850000e+00 7.180000e+02]
 [3.050000e+00 8.700000e+02]
 [3.380000e+00 4.100000e+02]
 [3.740000e+00 4.720000e+02]
 [3.350000e+00 9.850000e+02]
 [3.210000e+00 8.860000e+02]
 [3.800000e+00 4.280000e+02]
 [4.600000e+00 3.920000e+02]
 [2.650000e+00 5.000000e+02]
 [3.400000e+00 7.500000e+02]
 [2.570000e+00 4.630000e+02]
 [2.500000e+00 2.780000e+02]
 [3.900000e+00 7.140000e+02]
 [2.200000e+00 6.300000e+02]
 [4.800000e+00 5.150000e+02]
 [3.050000e+00 5.200000e+02]
 [2.620000e+00 4.500000e+02]
 [2.450000e+00 4.950000e+02]
 [2.600000e+00 5.620000e+02]
 [2.800000e+00 6.800000e+02]
 [1.740000e+00 6.250000e+02]
 [2.400000e+00 4.800000e+02]
 [3.600000e+00 4.500000e+02]
 [3.050000e+00 4.950000e+02]
 [2.150000e+00 2.900000e+02]
 [3.250000e+00 3.450000e+02]
 [2.600000e+00 9.370000e+02]
 [2.500000e+00 6.250000e+02]
 [2.900000e+00 4.280000e+02]
 [4.500000e+00 6.600000e+02]
 [2.300000e+00 4.060000e+02]
 [3.300000e+00 7.100000e+02]
 [2.450000e+00 5.620000e+02]
 [2.800000e+00 4.380000e+02]
 [2.060000e+00 4.150000e+02]
 [2.940000e+00 6.720000e+02]
 [2.700000e+00 3.150000e+02]
 [3.400000e+00 5.100000e+02]
 [3.300000e+00 4.880000e+02]
 [2.700000e+00 3.120000e+02]
 [2.650000e+00 6.800000e+02]
 [2.900000e+00 5.620000e+02]
 [2.000000e+00 3.250000e+02]
 [3.800000e+00 6.070000e+02]
 [3.080000e+00 4.340000e+02]
 [2.900000e+00 3.850000e+02]
 [1.900000e+00 4.070000e+02]
 [1.950000e+00 4.950000e+02]
 [2.060000e+00 3.450000e+02]
 [3.400000e+00 3.720000e+02]
 [1.280000e+00 5.640000e+02]
 [3.250000e+00 6.250000e+02]
 [6.000000e+00 4.650000e+02]
 [2.080000e+00 3.650000e+02]
 [2.600000e+00 3.800000e+02]
 [2.800000e+00 3.800000e+02]
 [2.760000e+00 3.780000e+02]
 [3.940000e+00 3.520000e+02]
 [3.000000e+00 4.660000e+02]
 [2.120000e+00 3.420000e+02]
 [2.600000e+00 5.800000e+02]
 [4.100000e+00 6.300000e+02]
 [5.400000e+00 5.300000e+02]
 [5.700000e+00 5.600000e+02]
 [5.000000e+00 6.000000e+02]
 [5.450000e+00 6.500000e+02]
 [7.100000e+00 6.950000e+02]
 [3.850000e+00 7.200000e+02]
 [5.000000e+00 5.150000e+02]
 [5.700000e+00 5.800000e+02]
 [4.920000e+00 5.900000e+02]
 [4.600000e+00 6.000000e+02]
 [5.600000e+00 7.800000e+02]
 [4.350000e+00 5.200000e+02]
 [4.400000e+00 5.500000e+02]
 [8.210000e+00 8.550000e+02]
 [4.000000e+00 8.300000e+02]
 [4.900000e+00 4.150000e+02]
 [7.650000e+00 6.250000e+02]
 [8.420000e+00 6.500000e+02]
 [9.400000e+00 5.500000e+02]
 [8.600000e+00 5.000000e+02]
 [1.080000e+01 4.800000e+02]
 [7.100000e+00 4.250000e+02]
 [1.052000e+01 6.750000e+02]
 [7.600000e+00 6.400000e+02]
 [7.900000e+00 7.250000e+02]
 [9.010000e+00 4.800000e+02]
 [7.500000e+00 8.800000e+02]
 [1.300000e+01 6.600000e+02]
 [1.175000e+01 6.200000e+02]
 [7.650000e+00 5.200000e+02]
 [5.880000e+00 6.800000e+02]
 [5.580000e+00 5.700000e+02]
 [5.280000e+00 6.750000e+02]
 [9.580000e+00 6.150000e+02]
 [6.620000e+00 5.200000e+02]
 [1.068000e+01 6.950000e+02]
 [1.026000e+01 6.850000e+02]
 [8.660000e+00 7.500000e+02]
 [8.500000e+00 6.300000e+02]
 [5.500000e+00 5.100000e+02]
 [9.899999e+00 4.700000e+02]
 [9.700000e+00 6.600000e+02]
 [7.700000e+00 7.400000e+02]
 [7.300000e+00 7.500000e+02]
 [1.020000e+01 8.350000e+02]
 [9.300000e+00 8.400000e+02]
 [9.200000e+00 5.600000e+02]]

Original number of features: (178, 13)
Reduced number of features: (178, 2)

Download Materials

What Users are saying..

profile image

Savvy Sahai

Data Science Intern, Capgemini
linkedin profile url

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More

Relevant Projects

Expedia Hotel Recommendations Data Science Project
In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

Build an AI Chatbot from Scratch using Keras Sequential Model
In this NLP Project, you will learn how to build an AI Chatbot from Scratch using Keras Sequential Model.

Model Deployment on GCP using Streamlit for Resume Parsing
Perform model deployment on GCP for resume parsing model using Streamlit App.

Learn Hyperparameter Tuning for Neural Networks with PyTorch
In this Deep Learning Project, you will learn how to optimally tune the hyperparameters (learning rate, epochs, dropout, early stopping) of a neural network model in PyTorch to improve model performance.

End-to-End ML Model Monitoring using Airflow and Docker
In this MLOps Project, you will learn to build an end to end pipeline to monitor any changes in the predictive power of model or degradation of data.

Build a Graph Based Recommendation System in Python-Part 2
In this Graph Based Recommender System Project, you will build a recommender system project for eCommerce platforms and learn to use FAISS for efficient similarity search.

Recommender System Machine Learning Project for Beginners-4
Collaborative Filtering Recommender System Project - Comparison of different model based and memory based methods to build recommendation system using collaborative filtering.

Build a Multi Class Image Classification Model Python using CNN
This project explains How to build a Sequential Model that can perform Multi Class Image Classification in Python using CNN

Build CNN for Image Colorization using Deep Transfer Learning
Image Processing Project -Train a model for colorization to make grayscale images colorful using convolutional autoencoders.

Build a Autoregressive and Moving Average Time Series Model
In this time series project, you will learn to build Autoregressive and Moving Average Time Series Models to forecast future readings, optimize performance, and harness the power of predictive analytics for sensor data.