How to select features using chi squared in Python?

How to select features using chi squared in Python?

How to select features using chi squared in Python?

This recipe helps you select features using chi squared in Python


Recipe Objective

To increse the score of the model we need the dataset that has high chi-squared statistics, so it will be good if we can select the features in the dataset which has high chi-squared statistics.

This data science python source code does the following:
1.Selects features using Chi-Squared method
2. Selects the best features
3. Optimizes the final prediction results

So this is the recipe on how we can select features using chi-squared in python.

Step 1 - Import the library

from sklearn import datasets from sklearn.feature_selection import SelectKBest from sklearn.feature_selection import chi2

We have only imported datasets to import the datasets, SelectKBest and chi2.

Step 2 - Setting up the Data

We have imported inbuilt wine dataset and stored data in X and target in y. We have also used print statement to print rows of the dataset. wine = datasets.load_wine() X = print(X) y = print(y)

Step 3 - Selecting Features With high chi-square

We have used SelectKBest to select the features with best chi-square, we have passed two parameters one is the scoring metric that is chi2 and other is the value of K which signifies the number of features we want in final dataset.

We have used fit_transform to fit and transfrom the current dataset into the desired dataset. Finally we have printed the final dataset and the shape of initial and final dataset. chi2_selector = SelectKBest(chi2, k=2) X_kbest = chi2_selector.fit_transform(X, y) print(X_kbest) print('Original number of features:', X.shape) print('Reduced number of features:', X_kbest.shape) So the output comes as

[[1.423e+01 1.710e+00 2.430e+00 ... 1.040e+00 3.920e+00 1.065e+03]
 [1.320e+01 1.780e+00 2.140e+00 ... 1.050e+00 3.400e+00 1.050e+03]
 [1.316e+01 2.360e+00 2.670e+00 ... 1.030e+00 3.170e+00 1.185e+03]
 [1.327e+01 4.280e+00 2.260e+00 ... 5.900e-01 1.560e+00 8.350e+02]
 [1.317e+01 2.590e+00 2.370e+00 ... 6.000e-01 1.620e+00 8.400e+02]
 [1.413e+01 4.100e+00 2.740e+00 ... 6.100e-01 1.600e+00 5.600e+02]]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]

[[5.640000e+00 1.065000e+03]
 [4.380000e+00 1.050000e+03]
 [5.680000e+00 1.185000e+03]
 [7.800000e+00 1.480000e+03]
 [4.320000e+00 7.350000e+02]
 [6.750000e+00 1.450000e+03]
 [5.250000e+00 1.290000e+03]
 [5.050000e+00 1.295000e+03]
 [5.200000e+00 1.045000e+03]
 [7.220000e+00 1.045000e+03]
 [5.750000e+00 1.510000e+03]
 [5.000000e+00 1.280000e+03]
 [5.600000e+00 1.320000e+03]
 [5.400000e+00 1.150000e+03]
 [7.500000e+00 1.547000e+03]
 [7.300000e+00 1.310000e+03]
 [6.200000e+00 1.280000e+03]
 [6.600000e+00 1.130000e+03]
 [8.700000e+00 1.680000e+03]
 [5.100000e+00 8.450000e+02]
 [5.650000e+00 7.800000e+02]
 [4.500000e+00 7.700000e+02]
 [3.800000e+00 1.035000e+03]
 [3.930000e+00 1.015000e+03]
 [3.520000e+00 8.450000e+02]
 [3.580000e+00 8.300000e+02]
 [4.800000e+00 1.195000e+03]
 [3.950000e+00 1.285000e+03]
 [4.500000e+00 9.150000e+02]
 [4.700000e+00 1.035000e+03]
 [5.700000e+00 1.285000e+03]
 [6.900000e+00 1.515000e+03]
 [3.840000e+00 9.900000e+02]
 [5.400000e+00 1.235000e+03]
 [4.200000e+00 1.095000e+03]
 [5.100000e+00 9.200000e+02]
 [4.600000e+00 8.800000e+02]
 [4.250000e+00 1.105000e+03]
 [3.700000e+00 1.020000e+03]
 [5.100000e+00 7.600000e+02]
 [6.130000e+00 7.950000e+02]
 [4.280000e+00 1.035000e+03]
 [5.430000e+00 1.095000e+03]
 [4.360000e+00 6.800000e+02]
 [5.040000e+00 8.850000e+02]
 [5.240000e+00 1.080000e+03]
 [4.900000e+00 1.065000e+03]
 [6.100000e+00 9.850000e+02]
 [6.200000e+00 1.060000e+03]
 [8.900000e+00 1.260000e+03]
 [7.200000e+00 1.150000e+03]
 [5.600000e+00 1.265000e+03]
 [7.050000e+00 1.190000e+03]
 [6.300000e+00 1.375000e+03]
 [5.850000e+00 1.060000e+03]
 [6.250000e+00 1.120000e+03]
 [6.380000e+00 9.700000e+02]
 [6.000000e+00 1.270000e+03]
 [6.800000e+00 1.285000e+03]
 [1.950000e+00 5.200000e+02]
 [3.270000e+00 6.800000e+02]
 [5.750000e+00 4.500000e+02]
 [3.800000e+00 6.300000e+02]
 [4.450000e+00 4.200000e+02]
 [2.950000e+00 3.550000e+02]
 [4.600000e+00 6.780000e+02]
 [5.300000e+00 5.020000e+02]
 [4.680000e+00 5.100000e+02]
 [3.170000e+00 7.500000e+02]
 [2.850000e+00 7.180000e+02]
 [3.050000e+00 8.700000e+02]
 [3.380000e+00 4.100000e+02]
 [3.740000e+00 4.720000e+02]
 [3.350000e+00 9.850000e+02]
 [3.210000e+00 8.860000e+02]
 [3.800000e+00 4.280000e+02]
 [4.600000e+00 3.920000e+02]
 [2.650000e+00 5.000000e+02]
 [3.400000e+00 7.500000e+02]
 [2.570000e+00 4.630000e+02]
 [2.500000e+00 2.780000e+02]
 [3.900000e+00 7.140000e+02]
 [2.200000e+00 6.300000e+02]
 [4.800000e+00 5.150000e+02]
 [3.050000e+00 5.200000e+02]
 [2.620000e+00 4.500000e+02]
 [2.450000e+00 4.950000e+02]
 [2.600000e+00 5.620000e+02]
 [2.800000e+00 6.800000e+02]
 [1.740000e+00 6.250000e+02]
 [2.400000e+00 4.800000e+02]
 [3.600000e+00 4.500000e+02]
 [3.050000e+00 4.950000e+02]
 [2.150000e+00 2.900000e+02]
 [3.250000e+00 3.450000e+02]
 [2.600000e+00 9.370000e+02]
 [2.500000e+00 6.250000e+02]
 [2.900000e+00 4.280000e+02]
 [4.500000e+00 6.600000e+02]
 [2.300000e+00 4.060000e+02]
 [3.300000e+00 7.100000e+02]
 [2.450000e+00 5.620000e+02]
 [2.800000e+00 4.380000e+02]
 [2.060000e+00 4.150000e+02]
 [2.940000e+00 6.720000e+02]
 [2.700000e+00 3.150000e+02]
 [3.400000e+00 5.100000e+02]
 [3.300000e+00 4.880000e+02]
 [2.700000e+00 3.120000e+02]
 [2.650000e+00 6.800000e+02]
 [2.900000e+00 5.620000e+02]
 [2.000000e+00 3.250000e+02]
 [3.800000e+00 6.070000e+02]
 [3.080000e+00 4.340000e+02]
 [2.900000e+00 3.850000e+02]
 [1.900000e+00 4.070000e+02]
 [1.950000e+00 4.950000e+02]
 [2.060000e+00 3.450000e+02]
 [3.400000e+00 3.720000e+02]
 [1.280000e+00 5.640000e+02]
 [3.250000e+00 6.250000e+02]
 [6.000000e+00 4.650000e+02]
 [2.080000e+00 3.650000e+02]
 [2.600000e+00 3.800000e+02]
 [2.800000e+00 3.800000e+02]
 [2.760000e+00 3.780000e+02]
 [3.940000e+00 3.520000e+02]
 [3.000000e+00 4.660000e+02]
 [2.120000e+00 3.420000e+02]
 [2.600000e+00 5.800000e+02]
 [4.100000e+00 6.300000e+02]
 [5.400000e+00 5.300000e+02]
 [5.700000e+00 5.600000e+02]
 [5.000000e+00 6.000000e+02]
 [5.450000e+00 6.500000e+02]
 [7.100000e+00 6.950000e+02]
 [3.850000e+00 7.200000e+02]
 [5.000000e+00 5.150000e+02]
 [5.700000e+00 5.800000e+02]
 [4.920000e+00 5.900000e+02]
 [4.600000e+00 6.000000e+02]
 [5.600000e+00 7.800000e+02]
 [4.350000e+00 5.200000e+02]
 [4.400000e+00 5.500000e+02]
 [8.210000e+00 8.550000e+02]
 [4.000000e+00 8.300000e+02]
 [4.900000e+00 4.150000e+02]
 [7.650000e+00 6.250000e+02]
 [8.420000e+00 6.500000e+02]
 [9.400000e+00 5.500000e+02]
 [8.600000e+00 5.000000e+02]
 [1.080000e+01 4.800000e+02]
 [7.100000e+00 4.250000e+02]
 [1.052000e+01 6.750000e+02]
 [7.600000e+00 6.400000e+02]
 [7.900000e+00 7.250000e+02]
 [9.010000e+00 4.800000e+02]
 [7.500000e+00 8.800000e+02]
 [1.300000e+01 6.600000e+02]
 [1.175000e+01 6.200000e+02]
 [7.650000e+00 5.200000e+02]
 [5.880000e+00 6.800000e+02]
 [5.580000e+00 5.700000e+02]
 [5.280000e+00 6.750000e+02]
 [9.580000e+00 6.150000e+02]
 [6.620000e+00 5.200000e+02]
 [1.068000e+01 6.950000e+02]
 [1.026000e+01 6.850000e+02]
 [8.660000e+00 7.500000e+02]
 [8.500000e+00 6.300000e+02]
 [5.500000e+00 5.100000e+02]
 [9.899999e+00 4.700000e+02]
 [9.700000e+00 6.600000e+02]
 [7.700000e+00 7.400000e+02]
 [7.300000e+00 7.500000e+02]
 [1.020000e+01 8.350000e+02]
 [9.300000e+00 8.400000e+02]
 [9.200000e+00 5.600000e+02]]

Original number of features: (178, 13)
Reduced number of features: (178, 2)

Relevant Projects

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.