How to select features using chi squared in Python?

This recipe helps you select features using chi squared in Python
Last Updated: 20 Dec 2022

Get access to Data Science projects View all Data Science projects

FEATURE EXTRACTION DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

To increse the score of the model we need the dataset that has high chi-squared statistics, so it will be good if we can select the features in the dataset which has high chi-squared statistics.

This data science python source code does the following:
1.Selects features using Chi-Squared method
2. Selects the best features
3. Optimizes the final prediction results

So this is the recipe on how we can select features using chi-squared in python.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Recipe Objective

Step 1 - Import the library

from sklearn import datasets from sklearn.feature_selection import SelectKBest from sklearn.feature_selection import chi2

We have only imported datasets to import the datasets, SelectKBest and chi2.

Step 2 - Setting up the Data

We have imported inbuilt wine dataset and stored data in X and target in y. We have also used print statement to print rows of the dataset. wine = datasets.load_wine() X = wine.data print(X) y = wine.target print(y)

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 3 - Selecting Features With high chi-square

We have used SelectKBest to select the features with best chi-square, we have passed two parameters one is the scoring metric that is chi2 and other is the value of K which signifies the number of features we want in final dataset. We have used fit_transform to fit and transfrom the current dataset into the desired dataset. Finally we have printed the final dataset and the shape of initial and final dataset. chi2_selector = SelectKBest(chi2, k=2) X_kbest = chi2_selector.fit_transform(X, y) print(X_kbest) print('Original number of features:', X.shape) print('Reduced number of features:', X_kbest.shape) So the output comes as

[[1.423e+01 1.710e+00 2.430e+00 ... 1.040e+00 3.920e+00 1.065e+03]
 [1.320e+01 1.780e+00 2.140e+00 ... 1.050e+00 3.400e+00 1.050e+03]
 [1.316e+01 2.360e+00 2.670e+00 ... 1.030e+00 3.170e+00 1.185e+03]
 ...
 [1.327e+01 4.280e+00 2.260e+00 ... 5.900e-01 1.560e+00 8.350e+02]
 [1.317e+01 2.590e+00 2.370e+00 ... 6.000e-01 1.620e+00 8.400e+02]
 [1.413e+01 4.100e+00 2.740e+00 ... 6.100e-01 1.600e+00 5.600e+02]]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]

[[5.640000e+00 1.065000e+03]
 [4.380000e+00 1.050000e+03]
 [5.680000e+00 1.185000e+03]
 [7.800000e+00 1.480000e+03]
 [4.320000e+00 7.350000e+02]
 [6.750000e+00 1.450000e+03]
 [5.250000e+00 1.290000e+03]
 [5.050000e+00 1.295000e+03]
 [5.200000e+00 1.045000e+03]
 [7.220000e+00 1.045000e+03]
 [5.750000e+00 1.510000e+03]
 [5.000000e+00 1.280000e+03]
 [5.600000e+00 1.320000e+03]
 [5.400000e+00 1.150000e+03]
 [7.500000e+00 1.547000e+03]
 [7.300000e+00 1.310000e+03]
 [6.200000e+00 1.280000e+03]
 [6.600000e+00 1.130000e+03]
 [8.700000e+00 1.680000e+03]
 [5.100000e+00 8.450000e+02]
 [5.650000e+00 7.800000e+02]
 [4.500000e+00 7.700000e+02]
 [3.800000e+00 1.035000e+03]
 [3.930000e+00 1.015000e+03]
 [3.520000e+00 8.450000e+02]
 [3.580000e+00 8.300000e+02]
 [4.800000e+00 1.195000e+03]
 [3.950000e+00 1.285000e+03]
 [4.500000e+00 9.150000e+02]
 [4.700000e+00 1.035000e+03]
 [5.700000e+00 1.285000e+03]
 [6.900000e+00 1.515000e+03]
 [3.840000e+00 9.900000e+02]
 [5.400000e+00 1.235000e+03]
 [4.200000e+00 1.095000e+03]
 [5.100000e+00 9.200000e+02]
 [4.600000e+00 8.800000e+02]
 [4.250000e+00 1.105000e+03]
 [3.700000e+00 1.020000e+03]
 [5.100000e+00 7.600000e+02]
 [6.130000e+00 7.950000e+02]
 [4.280000e+00 1.035000e+03]
 [5.430000e+00 1.095000e+03]
 [4.360000e+00 6.800000e+02]
 [5.040000e+00 8.850000e+02]
 [5.240000e+00 1.080000e+03]
 [4.900000e+00 1.065000e+03]
 [6.100000e+00 9.850000e+02]
 [6.200000e+00 1.060000e+03]
 [8.900000e+00 1.260000e+03]
 [7.200000e+00 1.150000e+03]
 [5.600000e+00 1.265000e+03]
 [7.050000e+00 1.190000e+03]
 [6.300000e+00 1.375000e+03]
 [5.850000e+00 1.060000e+03]
 [6.250000e+00 1.120000e+03]
 [6.380000e+00 9.700000e+02]
 [6.000000e+00 1.270000e+03]
 [6.800000e+00 1.285000e+03]
 [1.950000e+00 5.200000e+02]
 [3.270000e+00 6.800000e+02]
 [5.750000e+00 4.500000e+02]
 [3.800000e+00 6.300000e+02]
 [4.450000e+00 4.200000e+02]
 [2.950000e+00 3.550000e+02]
 [4.600000e+00 6.780000e+02]
 [5.300000e+00 5.020000e+02]
 [4.680000e+00 5.100000e+02]
 [3.170000e+00 7.500000e+02]
 [2.850000e+00 7.180000e+02]
 [3.050000e+00 8.700000e+02]
 [3.380000e+00 4.100000e+02]
 [3.740000e+00 4.720000e+02]
 [3.350000e+00 9.850000e+02]
 [3.210000e+00 8.860000e+02]
 [3.800000e+00 4.280000e+02]
 [4.600000e+00 3.920000e+02]
 [2.650000e+00 5.000000e+02]
 [3.400000e+00 7.500000e+02]
 [2.570000e+00 4.630000e+02]
 [2.500000e+00 2.780000e+02]
 [3.900000e+00 7.140000e+02]
 [2.200000e+00 6.300000e+02]
 [4.800000e+00 5.150000e+02]
 [3.050000e+00 5.200000e+02]
 [2.620000e+00 4.500000e+02]
 [2.450000e+00 4.950000e+02]
 [2.600000e+00 5.620000e+02]
 [2.800000e+00 6.800000e+02]
 [1.740000e+00 6.250000e+02]
 [2.400000e+00 4.800000e+02]
 [3.600000e+00 4.500000e+02]
 [3.050000e+00 4.950000e+02]
 [2.150000e+00 2.900000e+02]
 [3.250000e+00 3.450000e+02]
 [2.600000e+00 9.370000e+02]
 [2.500000e+00 6.250000e+02]
 [2.900000e+00 4.280000e+02]
 [4.500000e+00 6.600000e+02]
 [2.300000e+00 4.060000e+02]
 [3.300000e+00 7.100000e+02]
 [2.450000e+00 5.620000e+02]
 [2.800000e+00 4.380000e+02]
 [2.060000e+00 4.150000e+02]
 [2.940000e+00 6.720000e+02]
 [2.700000e+00 3.150000e+02]
 [3.400000e+00 5.100000e+02]
 [3.300000e+00 4.880000e+02]
 [2.700000e+00 3.120000e+02]
 [2.650000e+00 6.800000e+02]
 [2.900000e+00 5.620000e+02]
 [2.000000e+00 3.250000e+02]
 [3.800000e+00 6.070000e+02]
 [3.080000e+00 4.340000e+02]
 [2.900000e+00 3.850000e+02]
 [1.900000e+00 4.070000e+02]
 [1.950000e+00 4.950000e+02]
 [2.060000e+00 3.450000e+02]
 [3.400000e+00 3.720000e+02]
 [1.280000e+00 5.640000e+02]
 [3.250000e+00 6.250000e+02]
 [6.000000e+00 4.650000e+02]
 [2.080000e+00 3.650000e+02]
 [2.600000e+00 3.800000e+02]
 [2.800000e+00 3.800000e+02]
 [2.760000e+00 3.780000e+02]
 [3.940000e+00 3.520000e+02]
 [3.000000e+00 4.660000e+02]
 [2.120000e+00 3.420000e+02]
 [2.600000e+00 5.800000e+02]
 [4.100000e+00 6.300000e+02]
 [5.400000e+00 5.300000e+02]
 [5.700000e+00 5.600000e+02]
 [5.000000e+00 6.000000e+02]
 [5.450000e+00 6.500000e+02]
 [7.100000e+00 6.950000e+02]
 [3.850000e+00 7.200000e+02]
 [5.000000e+00 5.150000e+02]
 [5.700000e+00 5.800000e+02]
 [4.920000e+00 5.900000e+02]
 [4.600000e+00 6.000000e+02]
 [5.600000e+00 7.800000e+02]
 [4.350000e+00 5.200000e+02]
 [4.400000e+00 5.500000e+02]
 [8.210000e+00 8.550000e+02]
 [4.000000e+00 8.300000e+02]
 [4.900000e+00 4.150000e+02]
 [7.650000e+00 6.250000e+02]
 [8.420000e+00 6.500000e+02]
 [9.400000e+00 5.500000e+02]
 [8.600000e+00 5.000000e+02]
 [1.080000e+01 4.800000e+02]
 [7.100000e+00 4.250000e+02]
 [1.052000e+01 6.750000e+02]
 [7.600000e+00 6.400000e+02]
 [7.900000e+00 7.250000e+02]
 [9.010000e+00 4.800000e+02]
 [7.500000e+00 8.800000e+02]
 [1.300000e+01 6.600000e+02]
 [1.175000e+01 6.200000e+02]
 [7.650000e+00 5.200000e+02]
 [5.880000e+00 6.800000e+02]
 [5.580000e+00 5.700000e+02]
 [5.280000e+00 6.750000e+02]
 [9.580000e+00 6.150000e+02]
 [6.620000e+00 5.200000e+02]
 [1.068000e+01 6.950000e+02]
 [1.026000e+01 6.850000e+02]
 [8.660000e+00 7.500000e+02]
 [8.500000e+00 6.300000e+02]
 [5.500000e+00 5.100000e+02]
 [9.899999e+00 4.700000e+02]
 [9.700000e+00 6.600000e+02]
 [7.700000e+00 7.400000e+02]
 [7.300000e+00 7.500000e+02]
 [1.020000e+01 8.350000e+02]
 [9.300000e+00 8.400000e+02]
 [9.200000e+00 5.600000e+02]]

Original number of features: (178, 13)
Reduced number of features: (178, 2)

Download Materials

iPython Notebook

What Users are saying..

Ameeruddin Mohammed

ETL (Abintio) developer at IBM

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Deep Learning Project for Beginners with Source Code Part 1

Learn to implement deep neural networks in Python .

View Project Details

Build a Multi-Class Classification Model in Python on Saturn Cloud

In this machine learning classification project, you will build a multi-class classification model in Python on Saturn Cloud to predict the license status of a business.

View Project Details

OpenCV Project to Master Advanced Computer Vision Concepts

In this OpenCV project, you will learn to implement advanced computer vision concepts and algorithms in OpenCV library using Python.

View Project Details

Recommender System Machine Learning Project for Beginners-2

Recommender System Machine Learning Project for Beginners Part 2- Learn how to build a recommender system for market basket analysis using association rule mining.

View Project Details

NLP Project to Build a Resume Parser in Python using Spacy

Use the popular Spacy NLP python library for OCR and text classification to build a Resume Parser in Python.

View Project Details

Build CI/CD Pipeline for Machine Learning Projects using Jenkins

In this project, you will learn how to create a CI/CD pipeline for a search engine application using Jenkins.

View Project Details

Build a Customer Churn Prediction Model using Decision Trees

Develop a customer churn prediction model using decision tree machine learning algorithms and data science on streaming service data.

View Project Details

Hands-On Approach to Regression Discontinuity Design Python

In this machine learning project, you will learn to implement Regression Discontinuity Design Example in Python to determine the effect of age on Mortality Rate in Python.

View Project Details

Build a Multi Touch Attribution Machine Learning Model in Python

Identifying the ROI on marketing campaigns is an essential KPI for any business. In this ML project, you will learn to build a Multi Touch Attribution Model in Python to identify the ROI of various marketing efforts and their impact on conversions or sales..

View Project Details

Predict Churn for a Telecom company using Logistic Regression

Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

View Project Details

How to select features using chi squared in Python?

Recipe Objective

Table of Contents

Step 1 - Import the library

Step 2 - Setting up the Data

Step 3 - Selecting Features With high chi-square

Ameeruddin Mohammed

Relevant Projects

You might also like

Relevant Projects