How to impute missing class labels using nearest neighbours in Python?

How to impute missing class labels using nearest neighbours in Python?

How to impute missing class labels using nearest neighbours in Python?

This recipe helps you impute missing class labels using nearest neighbours in Python


Recipe Objective

Have you ever tried to impute calss labels? We can impute class labels by K nearest neighbours by training it on known data and predicting the class labels.

So this is the recipe on how we can impute missing class labels using nearest neighbours in Python.

Step 1 - Import the library

import numpy as np from sklearn.neighbors import KNeighborsClassifier

We have imported numpy and KNeighborsClassifier which is needed.

Step 2 - Setting up the Data

We have created a feature matrix using array and we will use this to train the KNN model. X = np.array([[0, 2.10, 1.45], [2, 1.18, 1.33], [0, 1.22, 1.27], [1, 1.32, 1.97], [1, -0.21, -1.19]]) We have created a matrix with missing class labels. X_with_nan = np.array([[np.nan, 0.87, 1.31], [np.nan, 0.37, 1.91], [np.nan, 0.54, 1.27], [np.nan, -0.67, -0.22]])

Step 3 - Predicting the Class Labels

We are training the KNeighborsClassifier with parameters K equals to 3 and weights equals to distance. We have used the matrix X to train the model. clf = KNeighborsClassifier(3, weights="distance") trained_model =[:,1:], X[:,0]) We have predicted the class labels of matrix "X_with_nan". imputed_values = trained_model.predict(X_with_nan[:,1:]) print(imputed_values) So finally we have filled the null values with the predicted output of model. X_with_imputed = np.hstack((imputed_values.reshape(-1,1), X_with_nan[:,1:])) print(); print(X_with_imputed) So the output comes as

[2. 1. 2. 1.]

[[ 2.    0.87  1.31]
 [ 1.    0.37  1.91]
 [ 2.    0.54  1.27]
 [ 1.   -0.67 -0.22]]

Relevant Projects

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.