Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

When Your Model Says “Yes” but Should’ve Said “No”

“Classification metrics: Because sometimes your AI confidently says ‘This is a cat!’ when it’s clearly a banana.” 🍌

Welcome to the binary battlefield of predictions — where your model chooses between 0 and 1, and every mistake costs money or customers. Let’s learn to measure how well your classifier performs (and laugh at its blunders while we’re at it).


🎬 Business Hook: “The Email Filter Fiasco” 📧

Your company’s spam filter model flags an email as spam.

  • It’s from your boss. 😱

  • Meanwhile, an actual scam email waltzes right into your inbox with a “Congrats, you’ve won $10M!”

That’s classification error in real life — and it’s exactly why we have metrics like precision, recall, F1-score, and more.


⚖️ Confusion Matrix: The Boardroom of Lies

Predicted: YES (1)Predicted: NO (0)
Actual: YES (1)✅ True Positive (TP)❌ False Negative (FN)
Actual: NO (0)❌ False Positive (FP)✅ True Negative (TN)

Let’s decode this chaos with a relatable example:

CaseMeaningAnalogy
TPModel correctly says “Fraud detected!”You catch the thief 🕵️‍♀️
TNModel correctly says “All good”Honest customers pass smoothly 💳
FPModel wrongly says “Fraud!”You just embarrassed a loyal customer 😬
FNModel misses fraudThe thief just walked away laughing 💰

🎯 Key Metrics

MetricFormulaMeaning in Business
Accuracy(TP + TN) / Total“How often are we right overall?”
PrecisionTP / (TP + FP)“When we predict positive, how often are we correct?”
Recall (Sensitivity)TP / (TP + FN)“How many actual positives did we catch?”
F1-Score2 × (Precision × Recall) / (Precision + Recall)“Balance between precision and recall”
AUC (ROC Curve)Area under the curve“How well does the model separate the two classes?”

⚙️ Quick Example

from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score

y_true = [1, 0, 1, 1, 0, 1, 0, 0]
y_pred = [1, 0, 1, 0, 0, 1, 1, 0]

print("Confusion Matrix:")
print(confusion_matrix(y_true, y_pred))

print("\nClassification Report:")
print(classification_report(y_true, y_pred))

print("\nAUC Score:", roc_auc_score(y_true, y_pred))

Output (abridged):

Confusion Matrix:
[[3 1]
 [1 3]]

Precision: 0.75
Recall: 0.75
F1-Score: 0.75
AUC Score: 0.75

💬 “Your model’s like an employee who’s 75% right. You’d keep them… but with close supervision.” 👀


🎭 Metric Personalities — Who’s Who

MetricPersonalityWorks Best When
AccuracyThe lazy optimist 😴When classes are balanced
PrecisionThe perfectionist 🧐When false alarms are costly (fraud, spam)
RecallThe safety net 🛟When missing positives hurts (disease, churn)
F1The diplomat ⚖️When you need balance between both
AUCThe strategist 🎯When you want overall model ranking power

🧩 Visualizing Confusion — Literally

import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import ConfusionMatrixDisplay

import numpy as np
cm = np.array([[3, 1],
               [1, 3]])

disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=["Not Fraud", "Fraud"])
disp.plot(cmap="Blues")
plt.title("Confusion Matrix - Fraud Detection Example")
plt.show()

💬 “If your matrix looks more confused than you are — your model probably is too.”


🧠 Business Case: Customer Churn Prediction

ScenarioPreferred MetricWhy
Predicting customer churnRecallMissing a churned customer means lost revenue
Fraud detectionPrecisionFalse alarms annoy good customers
Email spam filterF1-scoreBalance between blocking spam and not blocking real emails
Loan approvalAUCHelps compare models for overall discrimination ability

💬 “In business, your choice of metric is your KPI whisperer.”


🧪 Practice Lab – “Model Justice League” 🦸‍♀️

Dataset: customer_churn.csv

  1. Train a simple logistic regression model for churn prediction.

  2. Compute confusion matrix, precision, recall, F1, and AUC.

  3. Visualize using ConfusionMatrixDisplay.

  4. Write a short “business memo” explaining your model’s behavior:

    • Who did it save?

    • Who did it fail?

    • Should we retrain or redeploy?

🎯 Bonus: Plot Precision-Recall and ROC Curves using sklearn.metrics.plot_roc_curve.


📊 Precision-Recall vs ROC Curve

CurveWhat It ShowsWhen to Use
Precision-RecallFocuses on positive class performanceWhen positives are rare (e.g., fraud)
ROC CurveTrade-off between true and false positive ratesGood for comparing classifiers

💬 “The ROC curve tells you how smooth your model’s driving is; the PR curve shows if it avoids potholes.” 🚗


💼 Real Example: Fraud Detection

MetricModel AModel B
Accuracy95%92%
Precision60%85%
Recall90%70%
F172%77%

Which is better?

  • Model A catches more fraud but wrongly accuses innocents.

  • Model B avoids false alarms but misses a few bad guys.

💬 “In fraud detection, you’d rather annoy a few good customers than lose $10 million.” 💸


🧭 Recap

MetricMeasuresIdeal For
AccuracyOverall correctnessBalanced datasets
PrecisionTrue positives among predicted positivesFraud, spam
RecallTrue positives among actual positivesMedical, churn
F1Balance of precision & recallGeneral use
AUCModel discrimination abilityModel comparison

🔜 Next Up

👉 Head to Business Visualisation where we’ll turn these numbers into executive dashboards and beautiful visual insights that even your CFO will understand.

“Because nothing says ‘we’re data-driven’ like a chart that makes your boss nod thoughtfully.” 📊


# Your code here