AUC ROC

In this tutorial, we will be learning about the ROC curve.

Introduction

  • Before going through this post, it is highly recommended that you should have a good understanding of what is confusion matrix and it’s related terms. If you want to learn more about confusion matrix please go through my youtube video, click here
  • Most of the machine learning problems you solve from online courses are mostly introductory datasets which are balanced and have a guarantee of high accuracy with most classifiers. But doesn’t reflect how data looks in the “real world”.
  • Most of the “real world” data is not balanced. so it is important to learn how to evaluate the performance of the model where your interest lies in identifying the minority class.
  • If it is important to check or visualize the performance of the binary class classification problem, we use AUC (Area Under The Curve) ROC (Receiver Operating Characteristics) curve.

  • In few problems, predicting probabilities is better than predicting the class itself.
  • Predicting the probabilities can help you to tune the threshold values to balance between 2 errors that model make.
    • False Positives: You predicted today will be a rainy day when there was no rain
    • False Negatives: You predicted today is not a rainy day when in fact there was rain today.
  • So by tuning the threshold value you can change the behaviour of your model for the problem you are solving.
  • Let’s consider an example of wild animals prediction on Agri fields. Where we are more concerned with having low false negatives than low false positives. Because if we have more false negatives then we are not warning the farmers about the wild animals and their field will be damaged.

Example

Let’s use convolutional neural networks to detect animals in the frame and predict whether it is a wild animal or not. In the last layer, we will be using sigmoid as an activation function to predict the probability and predict the class(wild animal or not)

  • If you notice the circled animals where the sheep is predicted as wild animal and lion & zebra are predicted as a domestic animals (There can be any reason like bad light, fog, etc.. for this output)

  • Now if we fit the sigmoid activation on the last layer the y-axis will become the probability of predicting the type of an animal based on logit values.

  • In the end, we need to convert these probabilities into classifications. So we can set a threshold value to classify those prbobabilities. let’s consider 0.5 as a threshold value and if probability >= threshold then it is classified as a wild animal or if the probability is < threshold then it will be classified as a domestic animal

  • Now let’s evaluate the performance of the model with a threshold value of 0.5

  • If we see it classified all the animals correctly except 3 animals (sheep, zebra and lion)
  • Where it classified sheep as a wild animal and lion & Zebra as a domestic animals.

Lets’s see the confusion matrix and identify the error

  • Let’s first understand what is confusion matrix and how it looks

True Positives(TP): It is a wild animal and our model predicted it as a wild animal
False Positives(FP): It is a Domestic animal and our model predicted it as a wild animal
True Negatives(TN): It is a Domestic animal and our model predicted it as Domestic animal
False Negatives(FN): It is a Wild animal and our model predicted it as Domestic animal - This is confusion matrix and let's replace those 4 values which we got from the model

  • This is how it looks after filling the values

  • We can now evaluate the performance of the model when the threshold value is 0.5 by calculating sensitivity and specificity
    • Sensitivity: It gives the percentage value of animals which are classified as wild are wild.

    • It says that 60% of wild animals were correctly classified
    • Specificity: It gives the percentage value of animals which are classified as domestic are actually domestic.

    • It says that 75% of domestic animals were correctly classified
  • As we discussed our model will make 2 mistakes False Positives and False Negatives.
    • If predicting positives is important for you, then we should choose a model which makes less False Negatives(Choosing a model with higher sensitivity(True positive rate))
    • If predicting negatives is important for you, then we should choose a model which makes less False Positives(Choosing a model with higher specificity (True positive rate))
  • We can maximize or minimize the metric (sensitivity or specificity) by choosing the right threshold value.

-Let’s try to choose different threshold values and see how our model will work. - let the threshold value be 0.05

Confusion matrix

By Decreasing the threshold value
  False positives will be increased
  False Negatives will be decreased

By lowering the threshold value we are giving more importance in predicting the wild animals as wild. It is important right we need to alert the farmers correctly, else they get into trouble. There is no issue even False positives are increased upto some number.
  • Let’s Increase the threshold value to 0.8.

Confusion matrix

By Increasing the threshold value
  False positives will be decreased
  False Negatives will be increased

By increasing the threshold value we are giving more importance in predicting the domestic animals as domestic. There is no issue even False positives are increased in this case

You got some understanding right by varying the threshold value your result will get changed.

So which threshold value works best? It is hard to find which threshold value works better as we cannot go through every threshold value and check it’s respective confusion matrix.

One way to solve this is by using ROC(Receiver Operator Characteristic) Curve

ROC(Receiver Operator Characteristic) curve

  • It is used to measure the performance of your classification model at different threshold values.
  • ROC Curve tells us about how good the model can distinguish between two classes.
  • It also helps us to find the best threshold value.
  • This plot is plotted with 2 parameters.
    • True Positive Rate
    • False Positive Rate
  • True Positive Rate (TPR): It is a synonym for recall/Sensitivity.

  • False Positive Rate (FPR):

  • False Positive Rate (FPR) can also be written as:

  • ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives.

Relation between Sensitivity, Specificity, FPR and Threshold.

  • Sensitivity and Specificity are inversely proportional to each other. If we increase Sensitivity, Specificity decreases and vice versa.
  • If we decrease the threshold value, we are concentrating on predicting wild animals, so it increases the sensitivity and decreases the specificity.
  • If we Increase the threshold value, we are concentrating on predicting domestic animals, so it increases the specificity and decreases the sensitivity.

  • If we increase TPR, FPR also increases and vice versa.

Let’s create ROC curve from scratch

  • first let us consider a threshold value of 0.

  • Let’s build confusion matrix.

  • calculate TPR and FPR.

  • When we have threshold value as low(0) then we correctely classified all wild animals as wild and wrongly classified all domestic animals as wild
  • ROC graph

If you see the graph and line from (0,0) to (1,1). Whatever the point on this line means that the proportion of correctly classified samples is equal to the proportion of incorrectly classified samples.
  • Let’s increase the threshold value a bit to 0.1.

  • Let’s build confusion matrix

  • calculate TPR and FPR.

  • When we have threshold value as 0.1 then we correctely classified all wild animals as wild and wrongly classified 25% of domestic animals as wild
  • ROC graph

  • The point (0.25,1). Where the proportion of correctly classified samples is greater than proportion of incorrectly classified samples.

  • Let’s increase the threshold value a bit to 0.2

  • Let’s build confusion matrix

  • calculate TPR and FPR

  • When we have threshold value as 0.2 then we correctely classified 80% of wild animals as wild and wrongly classified 25% of domestic animals as wild
  • ROC graph

  • The point (0.25,0.8). Where the proportion of correctly classified samples is greater than proportion of incorrectly classified samples.

  • Let’s increase the threshold value a bit to 0.5

  • Let’s build confusion matrix

  • calculate TPR and FPR

  • When we have threshold value as 0.2 then we correctely classified 60% of wild animals as wild and wrongly classified 25% of domestic animals as wild
  • ROC graph

The point (0.25,0.6). Where the proportion of correctly classified samples is greater than proportion of incorrectly classified samples.
  • Let’s increase the threshold value a bit to 0.8

  • Let’s build confusion matrix

  • calculate TPR and FPR

  • When we have threshold value as 0.2 then we correctely classified 60% of wild animals as wild and wrongly classified 0% of domestic animals as wild
  • ROC graph

  • The point (0.0,0.6). Where the proportion of correctly classified samples is greater than proportion of incorrectly classified samples.

  • Let’s increase the threshold value a bit to 0.9

  • Let’s build confusion matrix

  • calculate TPR and FPR

  • When we have threshold value as 0.2 then we correctely classified 40% of wild animals as wild and wrongly classified 0% of domestic animals as wild
  • ROC graph

  • The point (0.0,0.4). Where the proportion of correctly classified samples is greater than proportion of incorrectly classified samples.

  • Let’s increase the threshold value a bit to 1.0

  • Let’s build confusion matrix

  • calculate TPR and FPR

  • When we have threshold value as 0.2 then we correctely classified 0% of wild animals as wild and wrongly classified 0% of domestic animals as wild
  • ROC graph

  • The point (0.0,0.0). Where we have zero true positives and zero false positives

  • Let’s draw the final ROC curve

    • The ROC graph summarizes all of the confusion matrices that each threshold produced.
    • The best threshold value can be E (0.0, 0.6) i.e 0.8 (where we don’t have any False positives)
    • And it mostly depends on how many False Positives you are willing to accept then you can choose the best threshold value
    • In this problem let’s say I can at least accept 25% False positives and 100% True positives. Then the best threshold value will be B (0.25,1) i.e 0.1

AUC (Area under the curve)

  • Let’s learn about AUC
  • It represents degree or measure of separability.
  • It measures the entire two-dimensional area underneath the entire ROC curve.
  • It provides an aggregate measure of performance across all possible classification thresholds.
  • It also helps you to compare multiple models.
  • Higher the AUC, better the model is.

Let’s see this with an example

  • We consider the ROC curve which we got from CNN model.

  • Let’s think we also build a model with NN to and we got this ROC curve

  • Let’s combine both into one, and see which model is best

  • From this, we can understand that CNN model works better than NN model because the AUC value of CNN is greater then AUC value of NN

Reference

I learned a lot of theoretical concepts about AUC ROC from this excellent video by Josh Starmer and explained with my own example. You can also watch the video to understand it more clearly.

If you like it please give a star for this repository.

Updated: