How to select features using chi squared in Python?
FEATURE EXTRACTION DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to select features using chi squared in Python?

How to select features using chi squared in Python?

This recipe helps you select features using chi squared in Python

0

Recipe Objective

To increse the score of the model we need the dataset that has high chi-squared statistics, so it will be good if we can select the features in the dataset which has high chi-squared statistics.

This data science python source code does the following:
1.Selects features using Chi-Squared method
2. Selects the best features
3. Optimizes the final prediction results

So this is the recipe on how we can select features using chi-squared in python.

Step 1 - Import the library

from sklearn import datasets from sklearn.feature_selection import SelectKBest from sklearn.feature_selection import chi2

We have only imported datasets to import the datasets, SelectKBest and chi2.

Step 2 - Setting up the Data

We have imported inbuilt wine dataset and stored data in X and target in y. We have also used print statement to print rows of the dataset. wine = datasets.load_wine() X = wine.data print(X) y = wine.target print(y)

Step 3 - Selecting Features With high chi-square

We have used SelectKBest to select the features with best chi-square, we have passed two parameters one is the scoring metric that is chi2 and other is the value of K which signifies the number of features we want in final dataset.

We have used fit_transform to fit and transfrom the current dataset into the desired dataset. Finally we have printed the final dataset and the shape of initial and final dataset. chi2_selector = SelectKBest(chi2, k=2) X_kbest = chi2_selector.fit_transform(X, y) print(X_kbest) print('Original number of features:', X.shape) print('Reduced number of features:', X_kbest.shape) So the output comes as

[[1.423e+01 1.710e+00 2.430e+00 ... 1.040e+00 3.920e+00 1.065e+03]
 [1.320e+01 1.780e+00 2.140e+00 ... 1.050e+00 3.400e+00 1.050e+03]
 [1.316e+01 2.360e+00 2.670e+00 ... 1.030e+00 3.170e+00 1.185e+03]
 ...
 [1.327e+01 4.280e+00 2.260e+00 ... 5.900e-01 1.560e+00 8.350e+02]
 [1.317e+01 2.590e+00 2.370e+00 ... 6.000e-01 1.620e+00 8.400e+02]
 [1.413e+01 4.100e+00 2.740e+00 ... 6.100e-01 1.600e+00 5.600e+02]]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]

[[5.640000e+00 1.065000e+03]
 [4.380000e+00 1.050000e+03]
 [5.680000e+00 1.185000e+03]
 [7.800000e+00 1.480000e+03]
 [4.320000e+00 7.350000e+02]
 [6.750000e+00 1.450000e+03]
 [5.250000e+00 1.290000e+03]
 [5.050000e+00 1.295000e+03]
 [5.200000e+00 1.045000e+03]
 [7.220000e+00 1.045000e+03]
 [5.750000e+00 1.510000e+03]
 [5.000000e+00 1.280000e+03]
 [5.600000e+00 1.320000e+03]
 [5.400000e+00 1.150000e+03]
 [7.500000e+00 1.547000e+03]
 [7.300000e+00 1.310000e+03]
 [6.200000e+00 1.280000e+03]
 [6.600000e+00 1.130000e+03]
 [8.700000e+00 1.680000e+03]
 [5.100000e+00 8.450000e+02]
 [5.650000e+00 7.800000e+02]
 [4.500000e+00 7.700000e+02]
 [3.800000e+00 1.035000e+03]
 [3.930000e+00 1.015000e+03]
 [3.520000e+00 8.450000e+02]
 [3.580000e+00 8.300000e+02]
 [4.800000e+00 1.195000e+03]
 [3.950000e+00 1.285000e+03]
 [4.500000e+00 9.150000e+02]
 [4.700000e+00 1.035000e+03]
 [5.700000e+00 1.285000e+03]
 [6.900000e+00 1.515000e+03]
 [3.840000e+00 9.900000e+02]
 [5.400000e+00 1.235000e+03]
 [4.200000e+00 1.095000e+03]
 [5.100000e+00 9.200000e+02]
 [4.600000e+00 8.800000e+02]
 [4.250000e+00 1.105000e+03]
 [3.700000e+00 1.020000e+03]
 [5.100000e+00 7.600000e+02]
 [6.130000e+00 7.950000e+02]
 [4.280000e+00 1.035000e+03]
 [5.430000e+00 1.095000e+03]
 [4.360000e+00 6.800000e+02]
 [5.040000e+00 8.850000e+02]
 [5.240000e+00 1.080000e+03]
 [4.900000e+00 1.065000e+03]
 [6.100000e+00 9.850000e+02]
 [6.200000e+00 1.060000e+03]
 [8.900000e+00 1.260000e+03]
 [7.200000e+00 1.150000e+03]
 [5.600000e+00 1.265000e+03]
 [7.050000e+00 1.190000e+03]
 [6.300000e+00 1.375000e+03]
 [5.850000e+00 1.060000e+03]
 [6.250000e+00 1.120000e+03]
 [6.380000e+00 9.700000e+02]
 [6.000000e+00 1.270000e+03]
 [6.800000e+00 1.285000e+03]
 [1.950000e+00 5.200000e+02]
 [3.270000e+00 6.800000e+02]
 [5.750000e+00 4.500000e+02]
 [3.800000e+00 6.300000e+02]
 [4.450000e+00 4.200000e+02]
 [2.950000e+00 3.550000e+02]
 [4.600000e+00 6.780000e+02]
 [5.300000e+00 5.020000e+02]
 [4.680000e+00 5.100000e+02]
 [3.170000e+00 7.500000e+02]
 [2.850000e+00 7.180000e+02]
 [3.050000e+00 8.700000e+02]
 [3.380000e+00 4.100000e+02]
 [3.740000e+00 4.720000e+02]
 [3.350000e+00 9.850000e+02]
 [3.210000e+00 8.860000e+02]
 [3.800000e+00 4.280000e+02]
 [4.600000e+00 3.920000e+02]
 [2.650000e+00 5.000000e+02]
 [3.400000e+00 7.500000e+02]
 [2.570000e+00 4.630000e+02]
 [2.500000e+00 2.780000e+02]
 [3.900000e+00 7.140000e+02]
 [2.200000e+00 6.300000e+02]
 [4.800000e+00 5.150000e+02]
 [3.050000e+00 5.200000e+02]
 [2.620000e+00 4.500000e+02]
 [2.450000e+00 4.950000e+02]
 [2.600000e+00 5.620000e+02]
 [2.800000e+00 6.800000e+02]
 [1.740000e+00 6.250000e+02]
 [2.400000e+00 4.800000e+02]
 [3.600000e+00 4.500000e+02]
 [3.050000e+00 4.950000e+02]
 [2.150000e+00 2.900000e+02]
 [3.250000e+00 3.450000e+02]
 [2.600000e+00 9.370000e+02]
 [2.500000e+00 6.250000e+02]
 [2.900000e+00 4.280000e+02]
 [4.500000e+00 6.600000e+02]
 [2.300000e+00 4.060000e+02]
 [3.300000e+00 7.100000e+02]
 [2.450000e+00 5.620000e+02]
 [2.800000e+00 4.380000e+02]
 [2.060000e+00 4.150000e+02]
 [2.940000e+00 6.720000e+02]
 [2.700000e+00 3.150000e+02]
 [3.400000e+00 5.100000e+02]
 [3.300000e+00 4.880000e+02]
 [2.700000e+00 3.120000e+02]
 [2.650000e+00 6.800000e+02]
 [2.900000e+00 5.620000e+02]
 [2.000000e+00 3.250000e+02]
 [3.800000e+00 6.070000e+02]
 [3.080000e+00 4.340000e+02]
 [2.900000e+00 3.850000e+02]
 [1.900000e+00 4.070000e+02]
 [1.950000e+00 4.950000e+02]
 [2.060000e+00 3.450000e+02]
 [3.400000e+00 3.720000e+02]
 [1.280000e+00 5.640000e+02]
 [3.250000e+00 6.250000e+02]
 [6.000000e+00 4.650000e+02]
 [2.080000e+00 3.650000e+02]
 [2.600000e+00 3.800000e+02]
 [2.800000e+00 3.800000e+02]
 [2.760000e+00 3.780000e+02]
 [3.940000e+00 3.520000e+02]
 [3.000000e+00 4.660000e+02]
 [2.120000e+00 3.420000e+02]
 [2.600000e+00 5.800000e+02]
 [4.100000e+00 6.300000e+02]
 [5.400000e+00 5.300000e+02]
 [5.700000e+00 5.600000e+02]
 [5.000000e+00 6.000000e+02]
 [5.450000e+00 6.500000e+02]
 [7.100000e+00 6.950000e+02]
 [3.850000e+00 7.200000e+02]
 [5.000000e+00 5.150000e+02]
 [5.700000e+00 5.800000e+02]
 [4.920000e+00 5.900000e+02]
 [4.600000e+00 6.000000e+02]
 [5.600000e+00 7.800000e+02]
 [4.350000e+00 5.200000e+02]
 [4.400000e+00 5.500000e+02]
 [8.210000e+00 8.550000e+02]
 [4.000000e+00 8.300000e+02]
 [4.900000e+00 4.150000e+02]
 [7.650000e+00 6.250000e+02]
 [8.420000e+00 6.500000e+02]
 [9.400000e+00 5.500000e+02]
 [8.600000e+00 5.000000e+02]
 [1.080000e+01 4.800000e+02]
 [7.100000e+00 4.250000e+02]
 [1.052000e+01 6.750000e+02]
 [7.600000e+00 6.400000e+02]
 [7.900000e+00 7.250000e+02]
 [9.010000e+00 4.800000e+02]
 [7.500000e+00 8.800000e+02]
 [1.300000e+01 6.600000e+02]
 [1.175000e+01 6.200000e+02]
 [7.650000e+00 5.200000e+02]
 [5.880000e+00 6.800000e+02]
 [5.580000e+00 5.700000e+02]
 [5.280000e+00 6.750000e+02]
 [9.580000e+00 6.150000e+02]
 [6.620000e+00 5.200000e+02]
 [1.068000e+01 6.950000e+02]
 [1.026000e+01 6.850000e+02]
 [8.660000e+00 7.500000e+02]
 [8.500000e+00 6.300000e+02]
 [5.500000e+00 5.100000e+02]
 [9.899999e+00 4.700000e+02]
 [9.700000e+00 6.600000e+02]
 [7.700000e+00 7.400000e+02]
 [7.300000e+00 7.500000e+02]
 [1.020000e+01 8.350000e+02]
 [9.300000e+00 8.400000e+02]
 [9.200000e+00 5.600000e+02]]

Original number of features: (178, 13)
Reduced number of features: (178, 2)

Relevant Projects

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.