How to use NaiveBayes Classifier in R

This recipe helps you use NaiveBayes Classifier in R

Recipe Objective

How to use NaiveBayes Classifier in R?

Naive Bayes is a supervised type of machine learning model, which is based on a non-linear classification algorithm. Naive Bayes classifiers are based on the probability approach of the Bayes theorem. The Naive Bayes classifier follows the assumption that predictor variables of the model are independent of each other. The outcome of a model depends on independent variables that have nothing to do with each other. Naïve Bayes algorithm: Bayes theorem gives the conditional probability of an event A given another event B that has occurred. **P (A|B) = P (A∩B)/P (B)** In this recipe, demonstrates an example on how to use Naive Bayes Classifier in R.

Step 1 - Install the necessary libraries

install.packages("e1071")
install.packages("caTools")
install.packages("caret")
library(e1071)
library(caTools)
library(caret)

Step 2 - Read a csv file and explore the data

data <- iris # use the iris dataset
head(data) # head() returns the top 6 rows of the dataframe
summary(data) # returns the statistical summary of the data columns
dim(data) # returns number of rows and columns in the dataset

Step 3 - Train and Test data

# split the data into train-test with a ratio 80:20
split <- sample.split(iris, SplitRatio = 0.8)
train_data <- subset(data, split == "TRUE")
test_data <- subset(data, split == "FALSE")
# Feature Scaling
train_scale <- scale(train_data[, 1:4])
test_scale <- scale(test_data[, 1:4])
dim(train_data)
dim(test_data)

Step 4 - Create a navieBayes model

set.seed(1) # Setting Seed
classifier_naive <- naiveBayes(Species~ ., data = train_data)
classifier_naive

The conditional probability for all the features created is calculated by the model separately and probabilities are calculated for them that indicate the distribution of the data.

summary(classifier_naive)

Step 5 - Make predictions on the test dataset

# Predicting on test data
y_pred <- predict(classifier_naive, newdata = test_data)

Step 6 - Check the accuracy of our model

# Confusion Matrix
conf_mat <- table(test_data$Species,y_pred)
print(conf_mat)
# Model Evauation
confusionMatrix(conf_mat)

Setosa: correctly classified 10. Versicolor: correctly classified 10, wrongly classified 1. Virginia : correctly identified 9. And also, the model achieved an accuracy of 96%

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Build a Multi Class Image Classification Model Python using CNN
This project explains How to build a Sequential Model that can perform Multi Class Image Classification in Python using CNN

Digit Recognition using CNN for MNIST Dataset in Python
In this deep learning project, you will build a convolutional neural network using MNIST dataset for handwritten digit recognition.

Recommender System Machine Learning Project for Beginners-3
Content Based Recommender System Project - Building a Content-Based Product Recommender App with Streamlit

PyTorch Project to Build a GAN Model on MNIST Dataset
In this deep learning project, you will learn how to build a GAN Model on MNIST Dataset for generating new images of handwritten digits.

Build Customer Propensity to Purchase Model in Python
In this machine learning project, you will learn to build a machine learning model to estimate customer propensity to purchase.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Word2Vec and FastText Word Embedding with Gensim in Python
In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

Learn to Build an End-to-End Machine Learning Pipeline - Part 1
In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, addressing a major challenge in the logistics industry.

Time Series Project to Build a Multiple Linear Regression Model
Learn to build a Multiple linear regression model in Python on Time Series Data