How to implement K NN classification in R

In this recipe, we shall learn the steps to implement a supervised machine learning algorithm - the K Nearest Neighbors Classification algorithm in R.
Last Updated: 05 Sep 2022

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN R DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective: How to implement K-NN classification in R

The supervised machine learning algorithm - k-nearest neighbors (KNN) is a simple, easy-to-implement technique that may be used to solve both classification and regression problems. A case is classified by a majority vote of its neighbors, with the case being allocated to the class with the most members among its K nearest neighbors as determined by a distance function. If K = 1, the case is assigned to the nearest neighbor's class.
Steps to implement K-NN classification in R -

Explore the Real-World Applications of Recommender Systems

Recipe Objective: How to implement K-NN classification in R

Step 1: Import required libraries

library(class) library(gmodels) library(dplyr)

Step 2: Load the data

We will make use of the iris dataframe. Iris is an inbuilt data frame that gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.

data = iris #displays first 6 rows of the dataset head(data)

	  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Step 3: Checking the summary

summary(data)

	  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50

Step 4: Normalize the data

#function for normalizing the data normal = function(x){ return((x-min(x))/(max(x)-min(x))) } #apply Min-Max normalization to first four columns in iris dataset data_norm <- as.data.frame(lapply(data[,1:4], normal)) head(data_norm)

	Sepal.Length Sepal.Width Petal.Length Petal.Width
1   0.22222222   0.6250000   0.06779661  0.04166667
2   0.16666667   0.4166667   0.06779661  0.04166667
3   0.11111111   0.5000000   0.05084746  0.04166667
4   0.08333333   0.4583333   0.08474576  0.04166667
5   0.19444444   0.6666667   0.06779661  0.04166667
6   0.30555556   0.7916667   0.11864407  0.12500000

Step 5: Splitting the data

#setting seed ensures that you get the same result if you start with that same seed each time you run the same process set.seed(123) data_spl<-sample(1:nrow(data_norm),size=nrow(data_norm)*0.8,replace = FALSE) train<-data_norm[data_spl,] #80% of data test<-data_norm[-data_spl,] #remaining 20% of the data

Step 6: Separating the train and test labels

train_labels <- data[data_spl,5] test_labels <- data[-data_spl,5]

Step 7: Training the model

model<-knn(train=train,test=test,cl=train_labels,k=11) model

[1] setosa     setosa     setosa     setosa     setosa     setosa    
[7] setosa     setosa     setosa     setosa     versicolor versicolor
[13] versicolor versicolor versicolor versicolor versicolor versicolor
[19] versicolor versicolor versicolor versicolor virginica  versicolor
[25] versicolor virginica  virginica  virginica  virginica  virginica 
Levels: setosa versicolor virginica

Step 8: Comparing the predicted and actual values

CrossTable(x=test_labels,y=model)

 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  30 

 
             | model 
 test_labels |     setosa | versicolor |  virginica |  Row Total | 
-------------|------------|------------|------------|------------|
      setosa |         10 |          0 |          0 |         10 | 
             |     13.333 |      4.667 |      2.000 |            | 
             |      1.000 |      0.000 |      0.000 |      0.333 | 
             |      1.000 |      0.000 |      0.000 |            | 
             |      0.333 |      0.000 |      0.000 |            | 
-------------|------------|------------|------------|------------|
  versicolor |          0 |         14 |          1 |         15 | 
             |      5.000 |      7.000 |      1.333 |            | 
             |      0.000 |      0.933 |      0.067 |      0.500 | 
             |      0.000 |      1.000 |      0.167 |            | 
             |      0.000 |      0.467 |      0.033 |            | 
-------------|------------|------------|------------|------------|
   virginica |          0 |          0 |          5 |          5 | 
             |      1.667 |      2.333 |     16.000 |            | 
             |      0.000 |      0.000 |      1.000 |      0.167 | 
             |      0.000 |      0.000 |      0.833 |            | 
             |      0.000 |      0.000 |      0.167 |            | 
-------------|------------|------------|------------|------------|
Column Total |         10 |         14 |          6 |         30 | 
             |      0.333 |      0.467 |      0.200 |            | 
-------------|------------|------------|------------|------------|

We can see that all the values in the testing data except one were classified correctly.

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Learn How to Build a Logistic Regression Model in PyTorch

In this Machine Learning Project, you will learn how to build a simple logistic regression model in PyTorch for customer churn prediction.

View Project Details

MLOps Project for a Mask R-CNN on GCP using uWSGI Flask

MLOps on GCP - Solved end-to-end MLOps Project to deploy a Mask RCNN Model for Image Segmentation as a Web Application using uWSGI Flask, Docker, and TensorFlow.

View Project Details

Learn Object Tracking (SOT, MOT) using OpenCV and Python

Get Started with Object Tracking using OpenCV and Python - Learn to implement Multiple Instance Learning Tracker (MIL) algorithm, Generic Object Tracking Using Regression Networks Tracker (GOTURN) algorithm, Kernelized Correlation Filters Tracker (KCF) algorithm, Tracking, Learning, Detection Tracker (TLD) algorithm for single and multiple object tracking from various video clips.

View Project Details

How to implement K NN classification in R

Recipe Objective: How to implement K-NN classification in R

Table of Contents

Step 1: Import required libraries

Step 2: Load the data

Step 3: Checking the summary

Step 4: Normalize the data

Step 5: Splitting the data

Step 6: Separating the train and test labels

Step 7: Training the model

Step 8: Comparing the predicted and actual values

Ray han

Relevant Projects

You might also like

Relevant Projects