Table of Contents
What Is Accuracy?
Accuracy is the proportion of correct predictions out of all predictions. It measures overall correctness of a classification model. While intuitive, accuracy can be misleading with imbalanced datasets where one class heavily outnumbers the other.
For example, a dataset with 95% negative cases yields 95% accuracy from a model that always predicts negative, despite being useless for detecting positive cases. This is why precision, recall, and F1 score provide essential complementary perspectives.
Formulas
Confusion Matrix
| Predicted + | Predicted - | |
|---|---|---|
| Actual + | TP | FN |
| Actual - | FP | TN |
- Precision: Of positive predictions, how many were correct. Minimizes false alarms.
- Recall: Of actual positives, how many were found. Minimizes missed cases.
- F1 Score: Harmonic mean balancing precision and recall.
- Specificity: Of actual negatives, how many were correctly identified.
Frequently Asked Questions
When is accuracy misleading?
With imbalanced classes. If 99% of emails are not spam, predicting "not spam" always gets 99% accuracy but catches zero spam. Use F1 or balanced accuracy instead.
What is a good F1 score?
Above 0.9 is excellent, 0.8-0.9 is good, 0.7-0.8 is fair. The threshold depends on the application and tolerance for errors.
Precision vs Recall: which matters more?
It depends on error costs. In medical screening, high recall is critical (missing disease is worse). In spam filtering, high precision matters (blocking legitimate email is worse).