Classification Metrics
Evaluating classification models requires different metrics than regression. Accuracy alone can be misleading, especially with imbalanced classes.
Confusion Matrix
A table showing true positives (TP), true negatives (TN), false positives (FP), false negatives (FN).
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_true, y_pred)Key Metrics
- Accuracy: (TP+TN)/(TP+TN+FP+FN) – overall correctness.
- Precision: TP/(TP+FP) – of predicted positives, how many were correct.
- Recall (Sensitivity): TP/(TP+FN) – of actual positives, how many were caught.
- F1 Score: harmonic mean of precision and recall – balanced metric.
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
print(accuracy_score(y_true, y_pred))
print(precision_score(y_true, y_pred))
print(recall_score(y_true, y_pred))
print(f1_score(y_true, y_pred))ROC Curve and AUC
ROC curve plots true positive rate vs false positive rate at various thresholds. AUC (Area Under Curve) summarizes performance – higher is better.
from sklearn.metrics import roc_auc_score
auc = roc_auc_score(y_true, y_proba)Choosing Metrics
- Balanced classes → accuracy is fine.
- Imbalanced classes (e.g., fraud detection) → precision, recall, F1, AUC.
- When false positives are costly → high precision.
- When false negatives are costly → high recall.
Two Minute Drill
- Accuracy can mislead for imbalanced data.
- Precision: correct among predicted positives.
- Recall: correct among actual positives.
- F1 balances precision and recall.
- AUC summarizes ROC performance.
Need more clarification?
Drop us an email at career@quipoinfotech.com
