How to use nearest neighbours for Classification in R

This recipe helps you use nearest neighbours for Classification in R
Last Updated: 22 Aug 2021

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN R DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

How to use nearest neighbors for classification in R?

KNN — K nearest neighbor is a supervised learning, non-linear type of model. This algorithm, unlike other supervised learning algorithms (linear regression and logistics) is non-parametric in nature. Which means they don't make any assumptions about the data or its distribution. In this algorithm, k clusters are created, and the data points are classified into those clusters based on the Euclidean distance. When new data points are to be added, the algorithm predicts and classifies it into the nearest cluster. This recipe demonstrates an example of nearest neighbors for Classification in R.

Step 1 - Install necessary libraries

install.packages('caTools') library(caTools) library(class)

Step 2 - Read the data

Here, we are using a diabetes dataset. There are various attributes which decide if a patient has diabetes or not. The Outcome: 1 — patient has diabetics and 0 — patient does not have diabetics.

data <- read.csv("/content/diabetes (1).csv") head(data) dim(data)

Step 3 - Split data into train and test

# Encoding the target and feature as factor data$Outcome = factor(data$Outcome , levels = c(0, 1)) set.seed(1) split = sample.split(data$Outcome , SplitRatio = 0.70) train_data = subset(data, split == TRUE) dim(train_data) test_data = subset(data, split == FALSE) dim(test_data)

Step 4 - Create knn models with varying k values

class_knn1 = knn(train = train_data[, -6], test = test_data[, -6], cl = train_data[, 9], k = 1, prob = TRUE) # Creating the Confusion Matrix cm = table(test_data$Outcome, class_knn1) cm # Calculate out of Sample error misClassError <- mean(class_knn1 != test_data$Outcome) print(paste('Accuracy =', 1-misClassError))

The model achieved 68.26 % accuracy with k = 1.

class_knn2 = knn(train = train_data[, -6], test = test_data[, -6], cl = train_data[, 9], k = 7, prob = TRUE) # Creating the Confusion Matrix cm = table(test_data$Outcome, class_knn2) cm # Calculate out of Sample error misClassError <- mean(class_knn2 != test_data$Outcome) print(paste('Accuracy =', 1-misClassError))

The model achieved 70.43 % accuracy with k = 7, which is more then k = 1.

class_knn3 = knn(train = train_data[, -6], test = test_data[, -6], cl = train_data[, 9], k = 15, prob = TRUE) # Creating the Confusion Matrix cm = table(test_data$Outcome, class_knn3) cm # Calculate out of Sample error misClassError <- mean(class_knn3 != test_data$Outcome) print(paste('Accuracy =', 1-misClassError))

The model achieved 72.17 % accuracy with k =10, which is more then k = 1, 7.

class_knn4 = knn(train = train_data[, -6], test = test_data[, -6], cl = train_data[, 9], k = 20, prob = TRUE) # Creating the Confusion Matrix cm = table(test_data$Outcome, class_knn4) cm # Calculate out of Sample error misClassError <- mean(class_knn4 != test_data$Outcome) print(paste('Accuracy =', 1-misClassError))

The model achieved 72.60 % accuracy with k =200, which is more than k = 1, 7, 15. Increasing k value beyond this doesn't give a significant difference, hence we will say the model can give good accuracy with k = 20.

What Users are saying..

Gautam Vermani

Data Consultant at Confidential

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Linear Regression Model Project in Python for Beginners Part 1

Machine Learning Linear Regression Project in Python to build a simple linear regression model and master the fundamentals of regression for beginners.

View Project Details

Langchain Project for Customer Support App in Python

In this LLM Project, you will learn how to enhance customer support interactions through Large Language Models (LLMs), enabling intelligent, context-aware responses. This Langchain project aims to seamlessly integrate LLM technology with databases, PDF knowledge bases, and audio processing agents to create a comprehensive customer support application.

View Project Details

Build a Multi Class Image Classification Model Python using CNN

This project explains How to build a Sequential Model that can perform Multi Class Image Classification in Python using CNN

View Project Details

Locality Sensitive Hashing Python Code for Look-Alike Modelling

In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.

View Project Details

Loan Eligibility Prediction in Python using H2O.ai

In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

View Project Details

Build Time Series Models for Gaussian Processes in Python

Time Series Project - A hands-on approach to Gaussian Processes for Time Series Modelling in Python

View Project Details

NLP Project to Build a Resume Parser in Python using Spacy

Use the popular Spacy NLP python library for OCR and text classification to build a Resume Parser in Python.

View Project Details

Hands-On Approach to Master PyTorch Tensors with Examples

In this deep learning project, you will learn how to perform various operations on the building block of PyTorch : Tensors.

View Project Details

A/B Testing Approach for Comparing Performance of ML Models

The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

View Project Details

Build Customer Propensity to Purchase Model in Python

In this machine learning project, you will learn to build a machine learning model to estimate customer propensity to purchase.

View Project Details

How to use nearest neighbours for Classification in R

Recipe Objective

Step 1 - Install necessary libraries

Step 2 - Read the data

Step 3 - Split data into train and test

Step 4 - Create knn models with varying k values

Gautam Vermani

Relevant Projects

You might also like

Relevant Projects