How to use nearest neighbours for Classification in R

This recipe helps you use nearest neighbours for Classification in R

Recipe Objective

How to use nearest neighbors for classification in R?

KNN — K nearest neighbor is a supervised learning, non-linear type of model. This algorithm, unlike other supervised learning algorithms (linear regression and logistics) is non-parametric in nature. Which means they don't make any assumptions about the data or its distribution. In this algorithm, k clusters are created, and the data points are classified into those clusters based on the Euclidean distance. When new data points are to be added, the algorithm predicts and classifies it into the nearest cluster. This recipe demonstrates an example of nearest neighbors for Classification in R.

Step 1 - Install necessary libraries

install.packages('caTools')
library(caTools)
library(class)

Step 2 - Read the data

Here, we are using a diabetes dataset. There are various attributes which decide if a patient has diabetes or not. The Outcome: 1 — patient has diabetics and 0 — patient does not have diabetics.

data <- read.csv("/content/diabetes (1).csv")
head(data)
dim(data)

Step 3 - Split data into train and test

# Encoding the target and feature as factor
data$Outcome = factor(data$Outcome , levels = c(0, 1))
set.seed(1)
split = sample.split(data$Outcome , SplitRatio = 0.70)
train_data = subset(data, split == TRUE)
dim(train_data)
test_data = subset(data, split == FALSE)
dim(test_data)

Step 4 - Create knn models with varying k values

class_knn1 = knn(train = train_data[, -6], test = test_data[, -6], cl = train_data[, 9], k = 1, prob = TRUE)
# Creating the Confusion Matrix
cm = table(test_data$Outcome, class_knn1)
cm
# Calculate out of Sample error misClassError <- mean(class_knn1 != test_data$Outcome)
print(paste('Accuracy =', 1-misClassError))

The model achieved 68.26 % accuracy with k = 1.

class_knn2 = knn(train = train_data[, -6], test = test_data[, -6], cl = train_data[, 9], k = 7, prob = TRUE)
# Creating the Confusion Matrix
cm = table(test_data$Outcome, class_knn2)
cm
# Calculate out of Sample error
misClassError <- mean(class_knn2 != test_data$Outcome)
print(paste('Accuracy =', 1-misClassError))

The model achieved 70.43 % accuracy with k = 7, which is more then k = 1.

class_knn3 = knn(train = train_data[, -6], test = test_data[, -6], cl = train_data[, 9], k = 15, prob = TRUE)
# Creating the Confusion Matrix
cm = table(test_data$Outcome, class_knn3)
cm
# Calculate out of Sample error
misClassError <- mean(class_knn3 != test_data$Outcome)
print(paste('Accuracy =', 1-misClassError))

The model achieved 72.17 % accuracy with k =10, which is more then k = 1, 7.

class_knn4 = knn(train = train_data[, -6], test = test_data[, -6], cl = train_data[, 9], k = 20, prob = TRUE)
# Creating the Confusion Matrix
cm = table(test_data$Outcome, class_knn4)
cm
# Calculate out of Sample error
misClassError <- mean(class_knn4 != test_data$Outcome)
print(paste('Accuracy =', 1-misClassError))

The model achieved 72.60 % accuracy with k =200, which is more than k = 1, 7, 15. Increasing k value beyond this doesn't give a significant difference, hence we will say the model can give good accuracy with k = 20.

What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Linear Regression Model Project in Python for Beginners Part 1
Machine Learning Linear Regression Project in Python to build a simple linear regression model and master the fundamentals of regression for beginners.

Langchain Project for Customer Support App in Python
In this LLM Project, you will learn how to enhance customer support interactions through Large Language Models (LLMs), enabling intelligent, context-aware responses. This Langchain project aims to seamlessly integrate LLM technology with databases, PDF knowledge bases, and audio processing agents to create a comprehensive customer support application.

Build a Multi Class Image Classification Model Python using CNN
This project explains How to build a Sequential Model that can perform Multi Class Image Classification in Python using CNN

Locality Sensitive Hashing Python Code for Look-Alike Modelling
In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.

Loan Eligibility Prediction in Python using H2O.ai
In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

Build Time Series Models for Gaussian Processes in Python
Time Series Project - A hands-on approach to Gaussian Processes for Time Series Modelling in Python

NLP Project to Build a Resume Parser in Python using Spacy
Use the popular Spacy NLP python library for OCR and text classification to build a Resume Parser in Python.

Hands-On Approach to Master PyTorch Tensors with Examples
In this deep learning project, you will learn how to perform various operations on the building block of PyTorch : Tensors.

A/B Testing Approach for Comparing Performance of ML Models
The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

Build Customer Propensity to Purchase Model in Python
In this machine learning project, you will learn to build a machine learning model to estimate customer propensity to purchase.