How to use NaiveBayes Classifier in R

This recipe helps you use NaiveBayes Classifier in R
Last Updated: 25 Jul 2022

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN R DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

How to use NaiveBayes Classifier in R?

Naive Bayes is a supervised type of machine learning model, which is based on a non-linear classification algorithm. Naive Bayes classifiers are based on the probability approach of the Bayes theorem. The Naive Bayes classifier follows the assumption that predictor variables of the model are independent of each other. The outcome of a model depends on independent variables that have nothing to do with each other. Naïve Bayes algorithm: Bayes theorem gives the conditional probability of an event A given another event B that has occurred. **P (A|B) = P (A∩B)/P (B)** In this recipe, demonstrates an example on how to use Naive Bayes Classifier in R.

Recipe Objective

Step 1 - Install the necessary libraries

install.packages("e1071") install.packages("caTools") install.packages("caret") library(e1071) library(caTools) library(caret)

Step 2 - Read a csv file and explore the data

data <- iris # use the iris dataset head(data) # head() returns the top 6 rows of the dataframe summary(data) # returns the statistical summary of the data columns dim(data) # returns number of rows and columns in the dataset

Step 3 - Train and Test data

# split the data into train-test with a ratio 80:20 split <- sample.split(iris, SplitRatio = 0.8) train_data <- subset(data, split == "TRUE") test_data <- subset(data, split == "FALSE") # Feature Scaling train_scale <- scale(train_data[, 1:4]) test_scale <- scale(test_data[, 1:4]) dim(train_data) dim(test_data)

Step 4 - Create a navieBayes model

set.seed(1) # Setting Seed classifier_naive <- naiveBayes(Species~ ., data = train_data) classifier_naive

The conditional probability for all the features created is calculated by the model separately and probabilities are calculated for them that indicate the distribution of the data.

summary(classifier_naive)

Step 5 - Make predictions on the test dataset

# Predicting on test data y_pred <- predict(classifier_naive, newdata = test_data)

Step 6 - Check the accuracy of our model

# Confusion Matrix conf_mat <- table(test_data$Species,y_pred) print(conf_mat) # Model Evauation confusionMatrix(conf_mat)

Setosa: correctly classified 10. Versicolor: correctly classified 10, wrongly classified 1. Virginia : correctly identified 9. And also, the model achieved an accuracy of 96%

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More