How to plot AUC ROC curve in R

This recipe helps you plot AUC ROC curve in R

Recipe Objective

How to plot AUC ROC curve in R.

Logistic Regression is a classification type supervised learning model. Logistic Regression is used when the independent variable x, can be a continuous or categorical variable, but the dependent variable (y) is a categorical variable. ROC visualizes two metrics as follows — Sensitivity / true positive rate: It measures the proportion of actual positives that are correctly identified. Specificity / true negative rate: It measures the proportion of actual negatives that are correctly identified. When a model is built, ROC curve — Receiver Operator Characteristic Curve can be used for checking the accuracy of the model. The area under the ROC curve is called as AUC -Area Under Curve. AUC ranges between 0 and 1 and is used for successful classification of the logistics model. This recipe demonstrates how to plot AUC ROC curve in R. In the following example, a '**Healthcare case study**' is taken, logistic regression had to be applied on a data set.

Step 1 - Load the necessary libraries

install.packages("dplyr") # Install dplyr library("dplyr") # Load dplyr # Installing the package install.packages("caTools") # For Logistic regression install.packages('pROC') # For ROC curve to evaluate model library(caTools) library(pROC)

Step 2 - Read a csv dataset

data <- read.csv("https://storage.googleapis.com/dimensionless/Analytics/quality.csv") # reads the dataset

Step 3- Create train and test dataset

split <- sample.split(data, SplitRatio = 0.8) split

The split method splits the data into train and test datasets with a ratio of 0.8 This means 80% of our dataset is passed in the training dataset and 20% in the testing dataset.

train <- subset(data, split == "TRUE") test <- subset(data, split == "FALSE")

The train dataset gets all the data points after split which are 'TRUE' and similarly the test dataset gets all the data points which are 'FALSE'.

Step 4 -Create a model for logistics using the training dataset

model = glm(PoorCare~.,train , family="binomial") # we use the glm()-general linear model to create an instance of model summary(model) # summary of the model tells us the different statistical values for our independent variables after the model is created

Step 5- Make predictions on the model using the test dataset

After the model is created and fitted, this model is used for making predictions on the unseen data values i.e the test dataset.

pred_test <- predict(model,test,type="response") pred_test

Step 6 - Model Diagnostics

After the predictions on the test datasets are made, create a confusion matrix with thershold value = 0.5 table(Actualvalue=test$PoorCare,Predictedvalue=pred_test>0.5) # assuming thershold to be 0.5 Our confusion matrix states that the true positves and true negatives are 19 and 4 respectively. But we have a false negative rate of 3, i.e the patients are predicted getting good care, but in fact they are not. Hence this rate must be reduced as much as possible.

accuracy = (19+4)/(19+4+3+3) # Out of all the classes, how much we predicted correctly, which must be high as possible accuracy sensitivity = 4/(4+3) # Sensitivity / true positive rate : It measures the proportion of actual positives that are correctly identified. sensitivity specificity = 19/(19+3) # Specificity / true negative rate : It measures the proportion of actual negatives that are correctly identified. specificity

Step 7 - Create AUC and ROC for test data(pROC lib)

ROC CURVE - ROC (Receiver Operator Characteristic Curve) can help in deciding the best threshold value. A ROC curve is plotted with FPR on the X-axis and TPR on the y-axis. A high threshold value gives - high specificity and low sensitivity A low threshold value gives - low specificity and high sensitivity.

test_prob = predict(model, test, type = "response") test_roc = roc(test$PoorCare ~ test_prob, plot = TRUE, print.auc = TRUE) as.numeric(test_roc$auc) {"mode":"full","isActive":false}

What Users are saying..

profile image

Jingwei Li

Graduate Research assistance at Stony Brook University
linkedin profile url

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Detectron2 Object Detection and Segmentation Example Python
Object Detection using Detectron2 - Build a Dectectron2 model to detect the zones and inhibitions in antibiogram images.

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

AWS MLOps Project for Gaussian Process Time Series Modeling
MLOps Project to Build and Deploy a Gaussian Process Time Series Model in Python on AWS

OpenCV Project to Master Advanced Computer Vision Concepts
In this OpenCV project, you will learn to implement advanced computer vision concepts and algorithms in OpenCV library using Python.

AWS MLOps Project to Deploy a Classification Model [Banking]
In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

Build a Face Recognition System in Python using FaceNet
In this deep learning project, you will build your own face recognition system in Python using OpenCV and FaceNet by extracting features from an image of a person's face.

Hands-On Approach to Causal Inference in Machine Learning
In this Machine Learning Project, you will learn to implement various causal inference techniques in Python to determine, how effective the sprinkler is in making the grass wet.

Census Income Data Set Project-Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based oncensus data.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Image Segmentation using Mask R-CNN with Tensorflow
In this Deep Learning Project on Image Segmentation Python, you will learn how to implement the Mask R-CNN model for early fire detection.