Table of Contents
What Is a Confusion Matrix?
A confusion matrix is a table that summarizes the performance of a classification algorithm. It displays the counts of true positive, true negative, false positive, and false negative predictions, allowing you to see not just how many mistakes a model makes, but what types of mistakes it makes.
This tool is essential in machine learning, medical diagnostics, spam detection, and any binary classification task. It provides a much richer picture than accuracy alone, especially when dealing with imbalanced datasets where one class significantly outnumbers the other.
Key Metrics Formulas
The Matrix Layout
| Predicted Positive | Predicted Negative | |
|---|---|---|
| Actual Positive | True Positive (TP) | False Negative (FN) |
| Actual Negative | False Positive (FP) | True Negative (TN) |
Interpreting Results
- High Precision, Low Recall: The model is conservative; when it predicts positive, it's usually right, but it misses many actual positives.
- Low Precision, High Recall: The model catches most positives but also flags many negatives incorrectly.
- F1-Score: The harmonic mean of precision and recall; useful when you need a single balanced metric.
- Specificity: The ability of the model to correctly identify negatives.
Frequently Asked Questions
When is accuracy misleading?
Accuracy can be misleading with imbalanced data. If 95% of cases are negative, a model that always predicts negative achieves 95% accuracy but has 0% recall for positives. In such cases, precision, recall, and F1-score are more informative metrics.
What is a good F1-score?
F1-scores range from 0 to 1. A score above 0.9 is excellent, 0.7-0.9 is good, and below 0.5 suggests the model needs improvement. The acceptable threshold depends on the application and cost of errors.
How do I improve my confusion matrix results?
Consider techniques like resampling imbalanced data, adjusting classification thresholds, feature engineering, trying different algorithms, or using ensemble methods. The optimal approach depends on which type of error (FP or FN) is more costly.