How to impute missing class labels using nearest neighbours in Python?

This recipe helps you impute missing class labels using nearest neighbours in Python

Recipe Objective

Have you ever tried to impute calss labels? We can impute class labels by K nearest neighbours by training it on known data and predicting the class labels.

So this is the recipe on how we can impute missing class labels using nearest neighbours in Python.

List of Classification Algorithms in Machine Learning

Step 1 - Import the library

import numpy as np from sklearn.neighbors import KNeighborsClassifier

We have imported numpy and KNeighborsClassifier which is needed.

Step 2 - Setting up the Data

We have created a feature matrix using array and we will use this to train the KNN model. X = np.array([[0, 2.10, 1.45], [2, 1.18, 1.33], [0, 1.22, 1.27], [1, 1.32, 1.97], [1, -0.21, -1.19]]) We have created a matrix with missing class labels. X_with_nan = np.array([[np.nan, 0.87, 1.31], [np.nan, 0.37, 1.91], [np.nan, 0.54, 1.27], [np.nan, -0.67, -0.22]])

Step 3 - Predicting the Class Labels

We are training the KNeighborsClassifier with parameters K equals to 3 and weights equals to distance. We have used the matrix X to train the model. clf = KNeighborsClassifier(3, weights="distance") trained_model = clf.fit(X[:,1:], X[:,0]) We have predicted the class labels of matrix "X_with_nan". imputed_values = trained_model.predict(X_with_nan[:,1:]) print(imputed_values) So finally we have filled the null values with the predicted output of model. X_with_imputed = np.hstack((imputed_values.reshape(-1,1), X_with_nan[:,1:])) print(); print(X_with_imputed) So the output comes as

[2. 1. 2. 1.]

[[ 2.    0.87  1.31]
 [ 1.    0.37  1.91]
 [ 2.    0.54  1.27]
 [ 1.   -0.67 -0.22]]

Download Materials

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Learn Object Tracking (SOT, MOT) using OpenCV and Python
Get Started with Object Tracking using OpenCV and Python - Learn to implement Multiple Instance Learning Tracker (MIL) algorithm, Generic Object Tracking Using Regression Networks Tracker (GOTURN) algorithm, Kernelized Correlation Filters Tracker (KCF) algorithm, Tracking, Learning, Detection Tracker (TLD) algorithm for single and multiple object tracking from various video clips.

Loan Eligibility Prediction Project using Machine learning on GCP
Loan Eligibility Prediction Project - Use SQL and Python to build a predictive model on GCP to determine whether an application requesting loan is eligible or not.

MLOps Project to Deploy Resume Parser Model on Paperspace
In this MLOps project, you will learn how to deploy a Resume Parser Streamlit Application on Paperspace Private Cloud.

Tensorflow Transfer Learning Model for Image Classification
Image Classification Project - Build an Image Classification Model on a Dataset of T-Shirt Images for Binary Classification

A/B Testing Approach for Comparing Performance of ML Models
The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

Time Series Classification Project for Elevator Failure Prediction
In this Time Series Project, you will predict the failure of elevators using IoT sensor data as a time series classification machine learning problem.

Hands-On Approach to Causal Inference in Machine Learning
In this Machine Learning Project, you will learn to implement various causal inference techniques in Python to determine, how effective the sprinkler is in making the grass wet.

Predictive Analytics Project for Working Capital Optimization
In this Predictive Analytics Project, you will build a model to accurately forecast the timing of customer and supplier payments for optimizing working capital.

Isolation Forest Model and LOF for Anomaly Detection in Python
Credit Card Fraud Detection Project - Build an Isolation Forest Model and Local Outlier Factor (LOF) in Python to identify fraudulent credit card transactions.

Learn to Build a Neural network from Scratch using NumPy
In this deep learning project, you will learn to build a neural network from scratch using NumPy