Classification Metrics

Classification Metrics#

⏳ Loading Pyodide…

When Your Model Says “Yes” but Should’ve Said “No”

“Classification metrics: Because sometimes your AI confidently says ‘This is a cat!’ when it’s clearly a banana.” 🍌

Welcome to the binary battlefield of predictions — where your model chooses between 0 and 1, and every mistake costs money or customers. Let’s learn to measure how well your classifier performs (and laugh at its blunders while we’re at it).

🎬 Business Hook: “The Email Filter Fiasco” 📧#

Your company’s spam filter model flags an email as spam.

It’s from your boss. 😱
Meanwhile, an actual scam email waltzes right into your inbox with a “Congrats, you’ve won $10M!”

That’s classification error in real life — and it’s exactly why we have metrics like precision, recall, F1-score, and more.

⚖️ Confusion Matrix: The Boardroom of Lies#

	Predicted: YES (1)	Predicted: NO (0)
Actual: YES (1)	✅ True Positive (TP)	❌ False Negative (FN)
Actual: NO (0)	❌ False Positive (FP)	✅ True Negative (TN)

Let’s decode this chaos with a relatable example:

Case	Meaning	Analogy
TP	Model correctly says “Fraud detected!”	You catch the thief 🕵️‍♀️
TN	Model correctly says “All good”	Honest customers pass smoothly 💳
FP	Model wrongly says “Fraud!”	You just embarrassed a loyal customer 😬
FN	Model misses fraud	The thief just walked away laughing 💰

🎯 Key Metrics#

Metric	Formula	Meaning in Business
Accuracy	(TP + TN) / Total	“How often are we right overall?”
Precision	TP / (TP + FP)	“When we predict positive, how often are we correct?”
Recall (Sensitivity)	TP / (TP + FN)	“How many actual positives did we catch?”
F1-Score	2 × (Precision × Recall) / (Precision + Recall)	“Balance between precision and recall”
AUC (ROC Curve)	Area under the curve	“How well does the model separate the two classes?”

⚙️ Quick Example#

Output (abridged):

Confusion Matrix:
[[3 1]
 [1 3]]

Precision: 0.75
Recall: 0.75
F1-Score: 0.75
AUC Score: 0.75

💬 “Your model’s like an employee who’s 75% right. You’d keep them… but with close supervision.” 👀

🎭 Metric Personalities — Who’s Who#

Metric	Personality	Works Best When
Accuracy	The lazy optimist 😴	When classes are balanced
Precision	The perfectionist 🧐	When false alarms are costly (fraud, spam)
Recall	The safety net 🛟	When missing positives hurts (disease, churn)
F1	The diplomat ⚖️	When you need balance between both
AUC	The strategist 🎯	When you want overall model ranking power

🧩 Visualizing Confusion — Literally#

💬 “If your matrix looks more confused than you are — your model probably is too.”

🧠 Business Case: Customer Churn Prediction#

Scenario	Preferred Metric	Why
Predicting customer churn	Recall	Missing a churned customer means lost revenue
Fraud detection	Precision	False alarms annoy good customers
Email spam filter	F1-score	Balance between blocking spam and not blocking real emails
Loan approval	AUC	Helps compare models for overall discrimination ability

💬 “In business, your choice of metric is your KPI whisperer.”

🧪 Practice Lab – “Model Justice League” 🦸‍♀️#

Dataset: customer_churn.csv

Train a simple logistic regression model for churn prediction.
Compute confusion matrix, precision, recall, F1, and AUC.
Visualize using ConfusionMatrixDisplay.
Write a short “business memo” explaining your model’s behavior:
- Who did it save?
- Who did it fail?
- Should we retrain or redeploy?

🎯 Bonus: Plot Precision-Recall and ROC Curves using sklearn.metrics.plot_roc_curve.

📊 Precision-Recall vs ROC Curve#

Curve	What It Shows	When to Use
Precision-Recall	Focuses on positive class performance	When positives are rare (e.g., fraud)
ROC Curve	Trade-off between true and false positive rates	Good for comparing classifiers

💬 “The ROC curve tells you how smooth your model’s driving is; the PR curve shows if it avoids potholes.” 🚗

💼 Real Example: Fraud Detection#

Metric	Model A	Model B
Accuracy	95%	92%
Precision	60%	85%
Recall	90%	70%
F1	72%	77%

Which is better?

Model A catches more fraud but wrongly accuses innocents.
Model B avoids false alarms but misses a few bad guys.

💬 “In fraud detection, you’d rather annoy a few good customers than lose $10 million.” 💸

🧭 Recap#

Metric	Measures	Ideal For
Accuracy	Overall correctness	Balanced datasets
Precision	True positives among predicted positives	Fraud, spam
Recall	True positives among actual positives	Medical, churn
F1	Balance of precision & recall	General use
AUC	Model discrimination ability	Model comparison

🔜 Next Up#

👉 Head to Business Visualisation where we’ll turn these numbers into executive dashboards and beautiful visual insights that even your CFO will understand.

“Because nothing says ‘we’re data-driven’ like a chart that makes your boss nod thoughtfully.” 📊

# Your code here