How to implement K NN classification in R

In this recipe, we shall learn the steps to implement a supervised machine learning algorithm - the K Nearest Neighbors Classification algorithm in R.

Recipe Objective: How to implement K-NN classification in R

The supervised machine learning algorithm - k-nearest neighbors (KNN) is a simple, easy-to-implement technique that may be used to solve both classification and regression problems. A case is classified by a majority vote of its neighbors, with the case being allocated to the class with the most members among its K nearest neighbors as determined by a distance function. If K = 1, the case is assigned to the nearest neighbor's class.
Steps to implement K-NN classification in R -

Explore the Real-World Applications of Recommender Systems

Step 1: Import required libraries

library(class)
library(gmodels)
library(dplyr)

Step 2: Load the data

We will make use of the iris dataframe. Iris is an inbuilt data frame that gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.

data = iris

#displays first 6 rows of the dataset
head(data)

	  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Step 3: Checking the summary

summary(data)

	  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50 

Step 4: Normalize the data

#function for normalizing the data
normal = function(x){
return((x-min(x))/(max(x)-min(x)))
}

#apply Min-Max normalization to first four columns in iris dataset
data_norm <- as.data.frame(lapply(data[,1:4], normal))
head(data_norm)

	Sepal.Length Sepal.Width Petal.Length Petal.Width
1   0.22222222   0.6250000   0.06779661  0.04166667
2   0.16666667   0.4166667   0.06779661  0.04166667
3   0.11111111   0.5000000   0.05084746  0.04166667
4   0.08333333   0.4583333   0.08474576  0.04166667
5   0.19444444   0.6666667   0.06779661  0.04166667
6   0.30555556   0.7916667   0.11864407  0.12500000

Step 5: Splitting the data

#setting seed ensures that you get the same result if you start with that same seed each time you run the same process
set.seed(123)
data_spl<-sample(1:nrow(data_norm),size=nrow(data_norm)*0.8,replace = FALSE)

train<-data_norm[data_spl,] #80% of data

test<-data_norm[-data_spl,] #remaining 20% of the data

Step 6: Separating the train and test labels

train_labels <- data[data_spl,5]
test_labels <- data[-data_spl,5]

Step 7: Training the model

model<-knn(train=train,test=test,cl=train_labels,k=11)
model

[1] setosa     setosa     setosa     setosa     setosa     setosa    
[7] setosa     setosa     setosa     setosa     versicolor versicolor
[13] versicolor versicolor versicolor versicolor versicolor versicolor
[19] versicolor versicolor versicolor versicolor virginica  versicolor
[25] versicolor virginica  virginica  virginica  virginica  virginica 
Levels: setosa versicolor virginica

Step 8: Comparing the predicted and actual values

CrossTable(x=test_labels,y=model)

 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  30 

 
             | model 
 test_labels |     setosa | versicolor |  virginica |  Row Total | 
-------------|------------|------------|------------|------------|
      setosa |         10 |          0 |          0 |         10 | 
             |     13.333 |      4.667 |      2.000 |            | 
             |      1.000 |      0.000 |      0.000 |      0.333 | 
             |      1.000 |      0.000 |      0.000 |            | 
             |      0.333 |      0.000 |      0.000 |            | 
-------------|------------|------------|------------|------------|
  versicolor |          0 |         14 |          1 |         15 | 
             |      5.000 |      7.000 |      1.333 |            | 
             |      0.000 |      0.933 |      0.067 |      0.500 | 
             |      0.000 |      1.000 |      0.167 |            | 
             |      0.000 |      0.467 |      0.033 |            | 
-------------|------------|------------|------------|------------|
   virginica |          0 |          0 |          5 |          5 | 
             |      1.667 |      2.333 |     16.000 |            | 
             |      0.000 |      0.000 |      1.000 |      0.167 | 
             |      0.000 |      0.000 |      0.833 |            | 
             |      0.000 |      0.000 |      0.167 |            | 
-------------|------------|------------|------------|------------|
Column Total |         10 |         14 |          6 |         30 | 
             |      0.333 |      0.467 |      0.200 |            | 
-------------|------------|------------|------------|------------|

We can see that all the values in the testing data except one were classified correctly.

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Learn How to Build a Logistic Regression Model in PyTorch
In this Machine Learning Project, you will learn how to build a simple logistic regression model in PyTorch for customer churn prediction.

MLOps Project for a Mask R-CNN on GCP using uWSGI Flask
MLOps on GCP - Solved end-to-end MLOps Project to deploy a Mask RCNN Model for Image Segmentation as a Web Application using uWSGI Flask, Docker, and TensorFlow.

Learn Object Tracking (SOT, MOT) using OpenCV and Python
Get Started with Object Tracking using OpenCV and Python - Learn to implement Multiple Instance Learning Tracker (MIL) algorithm, Generic Object Tracking Using Regression Networks Tracker (GOTURN) algorithm, Kernelized Correlation Filters Tracker (KCF) algorithm, Tracking, Learning, Detection Tracker (TLD) algorithm for single and multiple object tracking from various video clips.

Build CNN Image Classification Models for Real Time Prediction
Image Classification Project to build a CNN model in Python that can classify images into social security cards, driving licenses, and other key identity information.

Time Series Python Project using Greykite and Neural Prophet
In this time series project, you will forecast Walmart sales over time using the powerful, fast, and flexible time series forecasting library Greykite that helps automate time series problems.

Build Regression Models in Python for House Price Prediction
In this Machine Learning Regression project, you will build and evaluate various regression models in Python for house price prediction.

Deep Learning Project for Beginners with Source Code Part 1
Learn to implement deep neural networks in Python .

Create Your First Chatbot with RASA NLU Model and Python
Learn the basic aspects of chatbot development and open source conversational AI RASA to create a simple AI powered chatbot on your own.

Build a CNN Model with PyTorch for Image Classification
In this deep learning project, you will learn how to build an Image Classification Model using PyTorch CNN

MLOps AWS Project on Topic Modeling using Gunicorn Flask
In this project we will see the end-to-end machine learning development process to design, build and manage reproducible, testable, and evolvable machine learning models by using AWS