How to plot a ROC Curve in Python?

How to plot a ROC Curve in Python?

How to plot a ROC Curve in Python?

This recipe helps you plot a ROC Curve in Python

Recipe Objective

While working on a classification model, we feel a need of a metric which can show us how our model is performing. A metric which can also give a graphical representation of the performance will be very helpful.

ROC curve can efficiently give us the score that how our model is performing in classifing the labels. We can also plot graph between False Positive Rate and True Positive Rate with this ROC(Receiving Operating Characteristic) curve. The area under the ROC curve give is also a metric. Greater the area means better the performance.
Note that we can use ROC curve for a classification problem with two classes in the target. For Data having more than two classes we have to plot ROC curve with respect to each class taking rest of the combination of other classes as False Class.

So this recipe is a short example of how to use ROC and AUC to see the performance of our model.Here we will use it on two models for better understanding.

Step 1 - Import the library - GridSearchCv

from sklearn import datasets from sklearn.tree import DecisionTreeClassifier from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_curve, roc_auc_score from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt

Here we have imported various modules like: datasets from which we will get the dataset, DecisionTreeClassifier and LogisticRegression which we will use a models, roc_curve and roc_auc_score will be used to get the score and help us to plot the graph, train_test_split will split the data into two parts train and test and plt will be used to plot the graph.

Step 2 - Setup the Data

Here we have used datasets to load the inbuilt wine dataset and we have created objects X and y to store the data and the target value respectively. dataset = datasets.load_wine() X = y =

Step 3 - Spliting the data and Training the model

The module train_test_split is used to split the data into two parts, one is train which is used to train the model and the other is test which is used to check how our model is working on unseen data. Here we are passing 0.3 as a parameter in the train_test_split which will split the data such that 30% of data will be in test part and rest 70% will be in the train part. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

Now we are creating objects for classifier and training the classifier with the train split of the dataset i.e x_train and y_train. clf_tree = DecisionTreeClassifier(); clf_reg = LogisticRegression();, y_train);, y_train);

Step 5 - Using the models on test dataset

After traing the classifier on test dataset, we are using the model to predict the target values for test dataset. We are storing the predicted class by both of the models and we will use it to get the ROC AUC score y_score1 = clf_tree.predict_proba(X_test)[:,1] y_score2 = clf_reg.predict_proba(X_test)[:,1]

Step 6 - Creating False and True Positive Rates and printing Scores

We have to get False Positive Rates and True Postive rates for the Classifiers because these will be used to plot the ROC Curve. This can be done by roc_curve module by passing the test dataset and the predicted data through it. Here we are doing this for both the classifier. false_positive_rate1, true_positive_rate1, threshold1 = roc_curve(y_test, y_score1) false_positive_rate2, true_positive_rate2, threshold2 = roc_curve(y_test, y_score2) Now, For getting ROC_AUC score we can simply pass the test data and the predected data into the function ruc_auc_score. We are printing it with print statements for better understanding. print('roc_auc_score for DecisionTree: ', roc_auc_score(y_test, y_score1)) print('roc_auc_score for Logistic Regression: ', roc_auc_score(y_test, y_score2))

Step 7 - Ploting ROC Curves

We are ploting two ROC Curve as subplots one for DecisionTreeClassifier and another for LogisticRegression. Both have their respective False Positive Rate on X-axis and True Positive Rate on Y-axis. plt.subplots(1, figsize=(10,10)) plt.title('Receiver Operating Characteristic - DecisionTree') plt.plot(false_positive_rate1, true_positive_rate1) plt.plot([0, 1], ls="--") plt.plot([0, 0], [1, 0] , c=".7"), plt.plot([1, 1] , c=".7") plt.ylabel('True Positive Rate') plt.xlabel('False Positive Rate') plt.subplots(1, figsize=(10,10)) plt.title('Receiver Operating Characteristic - Logistic regression') plt.plot(false_positive_rate2, true_positive_rate2) plt.plot([0, 1], ls="--") plt.plot([0, 0], [1, 0] , c=".7"), plt.plot([1, 1] , c=".7") plt.ylabel('True Positive Rate') plt.xlabel('False Positive Rate') As an output we get:

roc_auc_score for DecisionTree:  0.9539141414141414
roc_auc_score for Logistic Regression:  0.9875140291806959

Download Materials

Relevant Projects

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Machine learning for Retail Price Recommendation with Python
Use the Mercari Dataset with dynamic pricing to build a price recommendation algorithm using machine learning in Python to automatically suggest the right product prices.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Ola Bike Rides Request Demand Forecast
Given big data at taxi service (ride-hailing) i.e. OLA, you will learn multi-step time series forecasting and clustering with Mini-Batch K-means Algorithm on geospatial data to predict future ride requests for a particular region at a given time.

Avocado Machine Learning Project Python for Price Prediction
In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.

Time Series Python Project using Greykite and Neural Prophet
In this time series project, you will forecast Walmart sales over time using the powerful, fast, and flexible time series forecasting library Greykite that helps automate time series problems.

Churn Prediction in Telecom using Machine Learning in R
Estimating churners before they discontinue using a product or service is extremely important. In this ML project, you will develop a churn prediction model in telecom to predict customers who are most likely subject to churn.

Forecasting Business KPI's with Tensorflow and Python
In this machine learning project, you will use the video clip of an IPL match played between CSK and RCB to forecast key performance indicators like the number of appearances of a brand logo, the frames, and the shortest and longest area percentage in the video.