A confusion matrix is a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known. The confusion matrix itself is relatively simple to understand, but the related terminology can be confusing.

 

 

True Positive:

Interpretation: You predicted positive and it’s true.

True Negative:

Interpretation: You predicted negative and it’s true.

False Positive: (Type 1 Error)

Interpretation: You predicted positive and it’s false.

False Negative: (Type 2 Error)

Interpretation: You predicted negative and it’s false.

 

Just Remember, We describe predicted values as Positive and Negative and actual values as True and False.

 

First two basic measures from the confusion matrix

Error rate (ERR) and accuracy (ACC) are the most common and intuitive measures derived from the confusion matrix.

Error rate

Error rate (ERR) is calculated as the number of all incorrect predictions divided by the total number of the dataset. The best error rate is 0.0, whereas the worst is 1.0.

Error rate calculation.
Error rate is calculated as the total number of two incorrect predictions (FN + FP) divided by the total number of a dataset (P + N).
  • \mathrm{ERR = \displaystyle \frac{FP + FN}{TP + TN + FN + FP} = \frac{FP + FN}{P + N}}

Accuracy

Accuracy (ACC) is calculated as the number of all correct predictions divided by the total number of the dataset. The best accuracy is 1.0, whereas the worst is 0.0. It can also be calculated by 1 – ERR.

Accuracy calculation.
Accuracy is calculated as the total number of two correct predictions (TP + TN) divided by the total number of a dataset (P + N).
  • \mathrm{ACC = \displaystyle \frac{TP +TN}{TP + TN + FN + FP} = \frac{TP + TN}{P + N}}

Other basic measures from the confusion matrix

Error costs of positives and negatives are usually different. For instance, one wants to avoid false negatives more than false positives or vice versa. Other basic measures, such as sensitivity and specificity, are more informative than accuracy and error rate in such cases.

Sensitivity (Recall or True positive rate)

Sensitivity (SN) is calculated as the number of correct positive predictions divided by the total number of positives. It is also called recall (REC) or true positive rate (TPR). The best sensitivity is 1.0, whereas the worst is 0.0.

Sensitivity calculation.
Sensitivity is calculated as the number of correct positive predictions (TP) divided by the total number of positives (P).
  • \mathrm{SN = \displaystyle \frac{TP}{TP + FN} = \frac{TP}{P}}

Specificity (True negative rate)

Specificity (SP) is calculated as the number of correct negative predictions divided by the total number of negatives. It is also called true negative rate (TNR). The best specificity is 1.0, whereas the worst is 0.0.

Specificity calculation.
Specificity is calculated as the number of correct negative predictions (TN) divided by the total number of negatives (N).
  • \mathrm{SP = \displaystyle \frac{TN}{TN + FP} = \frac{TN}{N}}

Precision (Positive predictive value)

Precision (PREC) is calculated as the number of correct positive predictions divided by the total number of positive predictions. It is also called positive predictive value (PPV). The best precision is 1.0, whereas the worst is 0.0.

Precision calculation.
Precision is calculated as the number of correct positive predictions (TP) divided by the total number of positive predictions (TP + FP).
  • \mathrm{PREC = \displaystyle \frac{TP}{TP + FP}}

False positive rate

False positive rate (FPR) is calculated as the number of incorrect positive predictions divided by the total number of negatives. The best false positive rate is 0.0 whereas the worst is 1.0. It can also be calculated as 1 – specificity.

False positive rate calculation.
False positive rate is calculated as the number of incorrect positive predictions (FP) divided by the total number of negatives (N).
  • \mathrm{FPR = \displaystyle \frac{FP}{TN + FP} = 1 - SP}

Correlation coefficient and F-score

Mathews correlation coefficient and F-score can be useful, but they are less frequently used than the other basic measures.

Matthews correlation coefficient

Matthews correlation coefficient (MCC) is a correlation coefficient calculated using all four values in the confusion matrix.

  • \mathrm{MCC = \displaystyle \frac{TP \cdot TN - FP \cdot FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}}

F-score

F-score is a harmonic mean of precision and recall.

  • \mathrm{F_{\beta} = \displaystyle \frac{(1 + \beta^2) (PREC \cdot REC)}{(\beta^2 \cdot PREC + REC)}}

β is commonly 0.5, 1, or 2.

  • \mathrm{F_{0.5} = \displaystyle \frac{1.25 \cdot PREC \cdot REC}{0.25 \cdot PREC + REC}}
  • \mathrm{F_{1} = \displaystyle \frac{2 \cdot PREC \cdot REC}{PREC + REC}}
  • \mathrm{F_{2} = \displaystyle \frac{5 \cdot PREC \cdot REC}{4 \cdot PREC + REC}}

A couple other terms are also worth mentioning:

  • Null Error Rate: This is how often you would be wrong if you always predicted the majority class. (In our example, the null error rate would be 60/165=0.36 because if you always predicted yes, you would only be wrong for the 60 "no" cases.) This can be a useful baseline metric to compare your classifier against. However, the best classifier for a particular application will sometimes have a higher error rate than the null error rate, as demonstrated by the Accuracy Paradox.
  • Cohen's Kappa: This is essentially a measure of how well the classifier performed as compared to how well it would have performed simply by chance. In other words, a model will have a high Kappa score if there is a big difference between the accuracy and the null error rate. (More details about Cohen's Kappa.)
  • F Score: This is a weighted average of the true positive rate (recall) and precision. (More details about the F Score.)
  • ROC Curve: This is a commonly used graph that summarizes the performance of a classifier over all possible thresholds. It is generated by plotting the True Positive Rate (y-axis) against the False Positive Rate (x-axis) as you vary the threshold for assigning observations to a given class. (More details about ROC Curves.)