How to use nearest neighbours for Regression in R

This recipe helps you use nearest neighbours for Regression in R

Recipe Objective

How to use nearest neighbours for Regression in R?

KNN — K nearest neighbor is a supervised learning, non-linear type of model. This algorithm, unlike other supervised learning algorithms (linear regression and logistics) is non-parametric in nature. Which means they don't make any assumptions about the data or its distribution. In this algorithm, k clusters are created, and the data points are classified into those clusters based on the Euclidean distance. When new data points are to be added, the algorithm predicts and classifies it into the nearest cluster. This recipe demonstrates an example of nearest neighbors for Regression in R.

Learn How to use XLNet for Text Classification

Step 1 - Install necessary libraries

install.packages('caTools')
library(caTools)
library(class)

Step 2 - Read the data

# use the iris dataset data <- iris
head(data)
dim(data)

Step 3 - Perform normalization on the dataset

# normalize the data to put them in a standard scale 0 - 1
normalization <-function(x) { (x -min(x))/(max(x)-min(x)) }
# normalization on all the predictor columns
data_norm <- as.data.frame(lapply(iris[,c(1,2,3,4)],normalization ))
summary(data_norm)

Step 3 - Split data into train and test

# training data
X_train = data_norm[, -5]
y_train = data$Species
# testing data
X_test = data_norm[, -5]
y_test = data$Species

Step 4 - Create knn models with varying k values

# k = 1, i.e only one cluster
knn_model1 <- knn(X_train, X_test, cl=y_train, k=1)
##create confusion matrix
conf_mat <- table(actual = knn_model1,predicted = y_test)
conf_mat
accuracy <- function(x){sum(diag(x)/(sum(rowSums(x)))) * 100}
accuracy(conf_mat)
# k = 5, i.e only one cluster
knn_model2 <- knn(X_train, X_test, cl=y_train, k=5)
conf_mat <- table(actual = knn_model2,predicted = y_test)
conf_mat
accuracy <- function(x){sum(diag(x)/(sum(rowSums(x)))) * 100}
accuracy(conf_mat)

We can executive the above model on different k values, i.e., number of clusters in order to find which cluster value can give us the best accuracy and correctly identify the data.

What Users are saying..

profile image

Jingwei Li

Graduate Research assistance at Stony Brook University
linkedin profile url

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

BERT Text Classification using DistilBERT and ALBERT Models
This Project Explains how to perform Text Classification using ALBERT and DistilBERT

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Avocado Machine Learning Project Python for Price Prediction
In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.

Build Regression (Linear,Ridge,Lasso) Models in NumPy Python
In this machine learning regression project, you will learn to build NumPy Regression Models (Linear Regression, Ridge Regression, Lasso Regression) from Scratch.

Learn to Build a Siamese Neural Network for Image Similarity
In this Deep Learning Project, you will learn how to build a siamese neural network with Keras and Tensorflow for Image Similarity.

PyCaret Project to Build and Deploy an ML App using Streamlit
In this PyCaret Project, you will build a customer segmentation model with PyCaret and deploy the machine learning application using Streamlit.

A/B Testing Approach for Comparing Performance of ML Models
The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

End-to-End ML Model Monitoring using Airflow and Docker
In this MLOps Project, you will learn to build an end to end pipeline to monitor any changes in the predictive power of model or degradation of data.

Build a Hybrid Recommender System in Python using LightFM
In this Recommender System project, you will build a hybrid recommender system in Python using LightFM .

Create Your First Chatbot with RASA NLU Model and Python
Learn the basic aspects of chatbot development and open source conversational AI RASA to create a simple AI powered chatbot on your own.