How to impute missing class labels using nearest neighbours in Python?

This recipe helps you impute missing class labels using nearest neighbours in Python

Recipe Objective

Have you ever tried to impute calss labels? We can impute class labels by K nearest neighbours by training it on known data and predicting the class labels.

So this is the recipe on how we can impute missing class labels using nearest neighbours in Python.

List of Classification Algorithms in Machine Learning

Step 1 - Import the library

import numpy as np from sklearn.neighbors import KNeighborsClassifier

We have imported numpy and KNeighborsClassifier which is needed.

Step 2 - Setting up the Data

We have created a feature matrix using array and we will use this to train the KNN model. X = np.array([[0, 2.10, 1.45], [2, 1.18, 1.33], [0, 1.22, 1.27], [1, 1.32, 1.97], [1, -0.21, -1.19]]) We have created a matrix with missing class labels. X_with_nan = np.array([[np.nan, 0.87, 1.31], [np.nan, 0.37, 1.91], [np.nan, 0.54, 1.27], [np.nan, -0.67, -0.22]])

Step 3 - Predicting the Class Labels

We are training the KNeighborsClassifier with parameters K equals to 3 and weights equals to distance. We have used the matrix X to train the model. clf = KNeighborsClassifier(3, weights="distance") trained_model = clf.fit(X[:,1:], X[:,0]) We have predicted the class labels of matrix "X_with_nan". imputed_values = trained_model.predict(X_with_nan[:,1:]) print(imputed_values) So finally we have filled the null values with the predicted output of model. X_with_imputed = np.hstack((imputed_values.reshape(-1,1), X_with_nan[:,1:])) print(); print(X_with_imputed) So the output comes as

[2. 1. 2. 1.]

[[ 2.    0.87  1.31]
 [ 1.    0.37  1.91]
 [ 2.    0.54  1.27]
 [ 1.   -0.67 -0.22]]

Download Materials

What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Build Regression (Linear,Ridge,Lasso) Models in NumPy Python
In this machine learning regression project, you will learn to build NumPy Regression Models (Linear Regression, Ridge Regression, Lasso Regression) from Scratch.

Abstractive Text Summarization using Transformers-BART Model
Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.

Langchain Project for Customer Support App in Python
In this LLM Project, you will learn how to enhance customer support interactions through Large Language Models (LLMs), enabling intelligent, context-aware responses. This Langchain project aims to seamlessly integrate LLM technology with databases, PDF knowledge bases, and audio processing agents to create a comprehensive customer support application.

OpenCV Project to Master Advanced Computer Vision Concepts
In this OpenCV project, you will learn to implement advanced computer vision concepts and algorithms in OpenCV library using Python.

Build an optimal End-to-End MLOps Pipeline and Deploy on GCP
Learn how to build and deploy an end-to-end optimal MLOps Pipeline for Loan Eligibility Prediction Model in Python on GCP

Detectron2 Object Detection and Segmentation Example Python
Object Detection using Detectron2 - Build a Dectectron2 model to detect the zones and inhibitions in antibiogram images.

Build Classification Algorithms for Digital Transformation[Banking]
Implement a machine learning approach using various classification techniques in Python to examine the digitalisation process of bank customers.

Deep Learning Project- Real-Time Fruit Detection using YOLOv4
In this deep learning project, you will learn to build an accurate, fast, and reliable real-time fruit detection system using the YOLOv4 object detection model for robotic harvesting platforms.

Hands-On Approach to Causal Inference in Machine Learning
In this Machine Learning Project, you will learn to implement various causal inference techniques in Python to determine, how effective the sprinkler is in making the grass wet.

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.