
Business Visualisation¶
Turning Model Results into Decisions¶
A model that no stakeholder understands does not get funded, deployed, or acted on. This notebook covers the visualisations that translate ML outputs into business language: confusion-matrix heatmaps, ROC and precision-recall curves, lift decile charts, threshold-profit dashboards, feature importance plots, and model comparison panels — built for both data science peers and non-technical audiences.
Why Visualisation Is the Last Mile¶

There are three distinct audiences for model visualisations, each needing a different chart style:
| Audience | What they ask | Right chart |
|---|---|---|
| Data scientist / ML engineer | How well does the model discriminate? | ROC curve, PR curve, calibration plot |
| Business analyst / ops team | Where should we focus resources? | Lift decile chart, gain curve, threshold-profit plot |
| Executive / stakeholder | Did this model make money? | ROI bar, model comparison panel, KPI delta card |
Setup — Shared Data and Model¶
All visualisations in this notebook use the same churn classification model so that charts are directly comparable.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import (
confusion_matrix, classification_report,
roc_curve, auc, precision_recall_curve, average_precision_score
)
np.random.seed(42)
X, y = make_classification(
n_samples=3000, n_features=12, n_informative=7,
weights=[0.82, 0.18], flip_y=0.04, random_state=42
)
feature_names = [
'recency', 'frequency', 'monetary', 'tenure',
'support_calls', 'last_login_days', 'plan_tier',
'payment_delay', 'products_used', 'engagement',
'nps_score', 'region_code'
]
X_tr, X_te, y_tr, y_te = train_test_split(
X, y, test_size=0.3, random_state=0, stratify=y
)
# Fit three models for comparison
lr = make_pipeline(StandardScaler(), LogisticRegression(C=1.0, random_state=0))
rf = RandomForestClassifier(n_estimators=100, random_state=0, n_jobs=-1)
gb = GradientBoostingClassifier(n_estimators=100, random_state=0)
for m in [lr, rf, gb]:
m.fit(X_tr, y_tr)
proba_lr = lr.predict_proba(X_te)[:, 1]
proba_rf = rf.predict_proba(X_te)[:, 1]
proba_gb = gb.predict_proba(X_te)[:, 1]
print(f"Test set: {len(y_te)} samples, {y_te.sum()} positive ({y_te.mean()*100:.1f}% churn rate)")
print("Models trained: Logistic Regression, Random Forest, Gradient Boosting")1. Annotated Confusion-Matrix Heatmap¶
The raw confusion matrix is opaque to non-technical readers. An annotated heatmap with percentages, colour intensity proportional to count, and clear outcome labels bridges that gap.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
def plot_confusion_matrix(y_true, y_pred, model_name='Model',
class_names=('Stay', 'Churn'), ax=None):
cm = confusion_matrix(y_true, y_pred)
cm_norm = cm.astype(float) / cm.sum(axis=1, keepdims=True)
if ax is None:
fig, ax = plt.subplots(figsize=(5, 4))
im = ax.imshow(cm_norm, cmap='Blues', vmin=0, vmax=1)
plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
labels = [['True Neg\n(loyal customer\nleft alone)',
'False Pos\n(unnecessary\noutreach)'],
['False Neg\n(churner\nmissed)',
'True Pos\n(churner\ncontacted)']]
for i in range(2):
for j in range(2):
color = 'white' if cm_norm[i, j] > 0.5 else 'black'
ax.text(j, i, f"{cm[i,j]}\n({cm_norm[i,j]*100:.1f}%)\n{labels[i][j]}",
ha='center', va='center', fontsize=8, color=color)
ax.set_xticks([0, 1]); ax.set_xticklabels(class_names)
ax.set_yticks([0, 1]); ax.set_yticklabels(class_names, rotation=90, va='center')
ax.set_xlabel('Predicted label')
ax.set_ylabel('True label')
ax.set_title(f'Confusion Matrix — {model_name}')
return ax
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
for ax, proba, name in zip(axes,
[proba_lr, proba_rf, proba_gb],
['Logistic Reg.', 'Random Forest', 'Grad. Boosting']):
plot_confusion_matrix(y_te, (proba >= 0.5).astype(int), model_name=name, ax=ax)
plt.suptitle('Confusion matrices at default threshold (0.5)', y=1.02, fontsize=12)
plt.tight_layout()
plt.show()2. ROC Curve and Precision–Recall Curve¶
ROC curve — plots true positive rate vs false positive rate across all thresholds. Good for comparing discriminative power. AUC = 0.5 is random; AUC = 1.0 is perfect.
Precision–recall curve — plots precision vs recall. More informative than ROC on heavily imbalanced datasets, because it focuses on the positive class performance.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc, precision_recall_curve, average_precision_score
models_info = [
('Logistic Reg.', proba_lr, 'steelblue'),
('Random Forest', proba_rf, 'tomato'),
('Grad. Boosting', proba_gb, 'seagreen'),
]
fig, axes = plt.subplots(1, 2, figsize=(13, 5))
# --- ROC ---
for name, proba, color in models_info:
fpr, tpr, thrs = roc_curve(y_te, proba)
roc_auc = auc(fpr, tpr)
axes[0].plot(fpr, tpr, color=color, linewidth=2,
label=f'{name} (AUC={roc_auc:.3f})')
# mark threshold=0.5
idx = np.argmin(np.abs(thrs - 0.5))
axes[0].scatter(fpr[idx], tpr[idx], color=color, s=60, zorder=5)
axes[0].plot([0,1],[0,1],'k--',linewidth=1,label='Random (AUC=0.5)')
axes[0].set_xlabel('False Positive Rate')
axes[0].set_ylabel('True Positive Rate (Recall)')
axes[0].set_title('ROC Curves (dots = threshold 0.5)')
axes[0].legend(loc='lower right', fontsize=8)
axes[0].grid(True, alpha=0.3)
# --- Precision-Recall ---
baseline_pr = y_te.mean()
for name, proba, color in models_info:
prec, rec, _ = precision_recall_curve(y_te, proba)
ap = average_precision_score(y_te, proba)
axes[1].plot(rec, prec, color=color, linewidth=2,
label=f'{name} (AP={ap:.3f})')
axes[1].axhline(baseline_pr, color='k', linestyle='--', linewidth=1,
label=f'Random baseline ({baseline_pr:.2f})')
axes[1].set_xlabel('Recall')
axes[1].set_ylabel('Precision')
axes[1].set_title('Precision–Recall Curves')
axes[1].legend(loc='upper right', fontsize=8)
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()3. Lift Decile Bar Chart¶
The lift decile chart divides the population into ten equal buckets ranked by predicted probability. Each bar shows how many times more positive cases fall in that bucket compared to random. Marketing and ops teams use this to decide which deciles are worth targeting.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
def lift_decile_chart(y_true, y_score, model_name='Model', ax=None, color='steelblue'):
n = len(y_true)
order = np.argsort(y_score)[::-1]
y_sorted = y_true[order]
base_rate = y_true.mean()
decile_size = n // 10
lifts, gains = [], []
for d in range(10):
bucket = y_sorted[d * decile_size : (d+1) * decile_size]
rate = bucket.mean()
lifts.append(rate / base_rate if base_rate > 0 else 0)
gains.append(bucket.sum() / y_true.sum() * 100)
if ax is None:
fig, ax = plt.subplots(figsize=(8, 4))
xs = np.arange(1, 11)
bars = ax.bar(xs, lifts, color=[color if l > 1 else '#cccccc' for l in lifts],
edgecolor='white', linewidth=0.8)
ax.axhline(1.0, color='k', linestyle='--', linewidth=1, label='Baseline (lift=1)')
for bar, lift, gain in zip(bars, lifts, gains):
ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.03,
f'{lift:.1f}x\n({gain:.0f}%)',
ha='center', va='bottom', fontsize=7)
ax.set_xticks(xs)
ax.set_xticklabels([f'D{d}' for d in range(1, 11)])
ax.set_xlabel('Decile (D1=highest score)')
ax.set_ylabel('Lift')
ax.set_title(f'Lift Decile Chart — {model_name}\n(bar labels: lift × and % of all churners in decile)')
ax.legend()
return ax
fig, axes = plt.subplots(1, 3, figsize=(16, 4))
for ax, proba, name, col in zip(
axes,
[proba_lr, proba_rf, proba_gb],
['Logistic Reg.', 'Random Forest', 'Grad. Boosting'],
['steelblue', 'tomato', 'seagreen']):
lift_decile_chart(y_te, proba, model_name=name, ax=ax, color=col)
plt.tight_layout()
plt.show()4. Threshold–Profit Dashboard¶
For the decision-maker who needs to pick a deployment threshold, a four-panel dashboard — profit curve, precision/recall vs threshold, confusion matrix at the chosen threshold, and the selected KPIs — provides everything in one view.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
# Business parameters
V_TP, C_FP, C_FN = 80.0, 10.0, 120.0
thresholds = np.linspace(0.01, 0.99, 300)
ev_curve, prec_curve, rec_curve = [], [], []
for t in thresholds:
y_pred_t = (proba_gb >= t).astype(int)
tn_t, fp_t, fn_t, tp_t = confusion_matrix(y_te, y_pred_t, labels=[0,1]).ravel()
ev_curve.append(tp_t * V_TP - fp_t * C_FP - fn_t * C_FN)
prec_t = tp_t / (tp_t + fp_t) if (tp_t + fp_t) > 0 else 0
rec_t = tp_t / (tp_t + fn_t) if (tp_t + fn_t) > 0 else 0
prec_curve.append(prec_t)
rec_curve.append(rec_t)
ev_curve = np.array(ev_curve)
best_idx = np.argmax(ev_curve)
best_t = thresholds[best_idx]
fig, axes = plt.subplots(2, 2, figsize=(13, 9))
fig.suptitle(f'Threshold Dashboard — Gradient Boosting (best t={best_t:.2f})', fontsize=13)
# Panel A: Profit curve
ax = axes[0, 0]
ax.plot(thresholds, ev_curve, color='seagreen', linewidth=2)
ax.axvline(best_t, color='green', linestyle='--', label=f'Best t={best_t:.2f}')
ax.axvline(0.5, color='red', linestyle=':', label='Default t=0.5')
ax.axhline(0, color='k', linewidth=0.7)
ax.fill_between(thresholds, 0, ev_curve, where=(ev_curve > 0), alpha=0.12, color='seagreen')
ax.set_xlabel('Threshold')
ax.set_ylabel('Expected Value ($)')
ax.set_title('A — Profit Curve')
ax.legend(fontsize=8)
ax.grid(True, alpha=0.3)
# Panel B: Precision / Recall vs threshold
ax = axes[0, 1]
ax.plot(thresholds, prec_curve, 'b-', linewidth=1.8, label='Precision')
ax.plot(thresholds, rec_curve, 'r-', linewidth=1.8, label='Recall')
ax.axvline(best_t, color='green', linestyle='--', label=f'Best t={best_t:.2f}')
ax.set_xlabel('Threshold')
ax.set_ylabel('Score')
ax.set_title('B — Precision & Recall')
ax.legend(fontsize=8)
ax.grid(True, alpha=0.3)
# Panel C: Confusion matrix at best threshold
ax = axes[1, 0]
y_best = (proba_gb >= best_t).astype(int)
cm = confusion_matrix(y_te, y_best)
cm_norm = cm / cm.sum(axis=1, keepdims=True)
im = ax.imshow(cm_norm, cmap='Blues', vmin=0, vmax=1)
plt.colorbar(im, ax=ax, fraction=0.046)
for i in range(2):
for j in range(2):
c = 'white' if cm_norm[i, j] > 0.55 else 'black'
ax.text(j, i, f'{cm[i,j]}\n({cm_norm[i,j]*100:.0f}%)',
ha='center', va='center', fontsize=10, color=c)
ax.set_xticks([0,1]); ax.set_xticklabels(['Pred Stay','Pred Churn'])
ax.set_yticks([0,1]); ax.set_yticklabels(['True Stay','True Churn'], rotation=90, va='center')
ax.set_title(f'C — Confusion Matrix at t={best_t:.2f}')
# Panel D: KPI summary bar
ax = axes[1, 1]
tn_b, fp_b, fn_b, tp_b = cm.ravel()
ev_best = tp_b * V_TP - fp_b * C_FP - fn_b * C_FN
ev_default = ev_curve[np.argmin(np.abs(thresholds - 0.5))]
kpi_names = ['EV @ optimal\nthreshold', 'EV @ default\n0.5', 'TP caught', 'FN missed']
kpi_values = [ev_best, ev_default, tp_b, fn_b]
colors_kpi = ['seagreen', 'tomato', 'steelblue', 'orange']
bars = ax.barh(kpi_names, kpi_values, color=colors_kpi)
for bar, val in zip(bars, kpi_values):
ax.text(bar.get_width() + abs(max(kpi_values)) * 0.01,
bar.get_y() + bar.get_height() / 2,
f'${val:,.0f}' if 'EV' in kpi_names[kpi_values.index(val)] else f'{int(val)}',
va='center', fontsize=9)
ax.set_title('D — KPI Summary')
ax.axvline(0, color='k', linewidth=0.7)
ax.grid(True, alpha=0.2, axis='x')
plt.tight_layout()
plt.show()5. Feature Importance Chart¶
Feature importance charts answer “which inputs does the model rely on most?” — a question every business stakeholder asks. For tree-based models, permutation importance is more reliable than the built-in impurity-based importance, especially when features have different cardinalities.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.inspection import permutation_importance
# Permutation importance on test set
result = permutation_importance(
rf, X_te, y_te, n_repeats=10, random_state=42,
scoring='roc_auc', n_jobs=-1
)
imp_mean = result.importances_mean
imp_std = result.importances_std
order = np.argsort(imp_mean) # ascending for horizontal bar
fig, ax = plt.subplots(figsize=(8, 6))
colors = ['tomato' if imp_mean[i] > 0 else '#cccccc' for i in order]
ax.barh(
[feature_names[i] for i in order],
imp_mean[order],
xerr=imp_std[order],
color=colors, edgecolor='white', linewidth=0.5,
capsize=4
)
ax.axvline(0, color='k', linewidth=0.8)
ax.set_xlabel('Permutation importance\n(mean decrease in AUC when feature is shuffled)')
ax.set_title('Feature Importance — Random Forest\n(permutation importance, test set, 10 repeats)')
ax.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()
print("Top 3 features by permutation importance:")
for rank, i in enumerate(order[::-1][:3], 1):
print(f" {rank}. {feature_names[i]:20s} {imp_mean[i]:.4f} ± {imp_std[i]:.4f}")6. Multi-Model Comparison Panel¶
When presenting model selection results to a team, a single comparison panel with consistent metrics across all candidates is clearer than separate figures.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import (roc_auc_score, average_precision_score,
f1_score, confusion_matrix)
model_labels = ['Logistic\nReg.', 'Random\nForest', 'Grad.\nBoosting']
probas = [proba_lr, proba_rf, proba_gb]
colors = ['steelblue', 'tomato', 'seagreen']
metrics = {}
for name, proba in zip(model_labels, probas):
y_pred = (proba >= 0.5).astype(int)
tn, fp, fn, tp = confusion_matrix(y_te, y_pred, labels=[0,1]).ravel()
ev = tp * V_TP - fp * C_FP - fn * C_FN
metrics[name] = {
'AUC-ROC': roc_auc_score(y_te, proba),
'Avg Prec.': average_precision_score(y_te, proba),
'F1 (churn)': f1_score(y_te, y_pred),
'Biz EV ($k)': ev / 1000,
}
metric_names = list(next(iter(metrics.values())).keys())
x = np.arange(len(metric_names))
width = 0.25
fig, ax = plt.subplots(figsize=(11, 5))
for i, (name, vals) in enumerate(metrics.items()):
bar_vals = [vals[m] for m in metric_names]
offset = (i - 1) * width
bars = ax.bar(x + offset, bar_vals, width, label=name,
color=colors[i], edgecolor='white')
for bar, v in zip(bars, bar_vals):
ax.text(bar.get_x() + bar.get_width()/2,
bar.get_height() + 0.005,
f'{v:.2f}' if abs(v) < 10 else f'${v:.1f}k',
ha='center', va='bottom', fontsize=7)
ax.set_xticks(x)
ax.set_xticklabels(metric_names)
ax.set_ylabel('Score / Value')
ax.set_title('Model Comparison Panel — Churn Classification')
ax.legend()
ax.grid(True, alpha=0.2, axis='y')
ax.axhline(0, color='k', linewidth=0.7)
plt.tight_layout()
plt.show()
print("\nModel comparison summary:")
for name, vals in metrics.items():
print(f" {name.replace(chr(10),' '):20s} " +
" ".join(f"{k}={v:.3f}" for k,v in vals.items()))Visual Design Principles for Business Charts¶
| Colour | Use for | Avoid |
|---|---|---|
| Green | Positive outcomes, profit, improvement | Errors, losses |
| Red / tomato | Errors, cost, warnings | Success metrics |
| Blue | Neutral comparisons, reference lines | Strong warnings |
| Grey | Baselines, inactive elements | Primary data |
Use a consistent palette across all panels in a report. Changing colours between charts forces readers to re-learn the mapping.
Always label the key point, not just the axis. “Best threshold = 0.18” is clearer than a dashed line the reader must trace to the x-axis.
Add a business translation to metric labels: “AUC-ROC (how well the model ranks customers)” is more useful than “AUC-ROC” alone.
Show uncertainty where it exists — error bars on importance scores, shaded confidence bands on learning curves.
Limit decimal places: report R² = 0.49, not 0.489312.
| Chart element | Data science review | Executive presentation |
|---|---|---|
| Title | “ROC curve, AUC=0.81” | “Model ranks churners 4× better than random” |
| Y-axis | “True positive rate” | “% churners caught” |
| Key callout | Best threshold marker | “At this setting: $4,830 net value per campaign” |
| Baseline | Random model line | “What we got before using ML” |
Prepare two versions of critical charts: a detailed version for technical review and a simplified version for decision-makers.
Try It in the Browser¶
Compute a lift decile table in pure Python — see which deciles are worth targeting.
Guided Practice¶
A marketing manager asks: "If I can only contact 20% of customers, which ones should I pick?" Which chart answers this most directly?¶
Why is the precision–recall curve more informative than the ROC curve on a heavily imbalanced dataset?¶
You build a model comparison bar chart where Model A has AUC=0.83 and Model B has AUC=0.81, but Model B has higher business EV. Which model should you recommend?¶
Permutation importance is preferred over built-in tree impurity importance because:¶
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc
# proba_gb and y_te are available from the setup cell
# TODO:
# 1. Compute ROC curve for the Gradient Boosting model
# fpr, tpr, thrs = roc_curve(y_te, proba_gb)
# roc_auc = auc(fpr, tpr)
#
# 2. Plot the ROC curve
# 3. Mark three threshold points on the curve: t=0.1, t=0.3, t=0.5
# Use ax.scatter and ax.annotate to label each point with its threshold value
# 4. Add a text box showing AUC in the lower-right corner
# 5. Add the random baseline line
print("Uncomment and complete the annotated ROC curve.")Exercise 2 — Grouped model comparison with business EV¶
Using the three models (lr, rf, gb) and their test probabilities:
Compute AUC-ROC, Average Precision, F1-Score, and Business EV for each model.
Rank models by each metric — does the ranking change?
Build a grouped bar chart (four metric groups, three bars per group) with value labels.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import roc_auc_score, average_precision_score, f1_score, confusion_matrix
# V_TP, C_FP, C_FN are defined in the setup/threshold cells
# TODO:
# model_labels = ['Logistic Reg.', 'Random Forest', 'Grad. Boosting']
# probas = [proba_lr, proba_rf, proba_gb]
#
# for name, proba in zip(model_labels, probas):
# y_pred = (proba >= 0.5).astype(int)
# tn, fp, fn, tp = confusion_matrix(y_te, y_pred, labels=[0,1]).ravel()
# ev = tp * 80 - fp * 10 - fn * 120
# print(f"{name}: AUC={roc_auc_score(y_te, proba):.3f} AP={average_precision_score(y_te, proba):.3f}"
# f" F1={f1_score(y_te, y_pred):.3f} EV=${ev:,.0f}")
print("Uncomment and complete the model comparison.")Exercise 3 — Executive one-pager¶
Produce a single 2×2 figure that a non-technical manager can read in 60 seconds:
Panel A: Lift decile chart for the best model (highest EV from Exercise 2).
Panel B: Profit curve (EV vs threshold) with the optimal point labelled in dollars.
Panel C: Model comparison bar chart (just Business EV, three models).
Panel D: Plain-text KPI card: TP, FN, EV, and lift at top 20% — use
ax.text()only, no bars.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# TODO: build the 2×2 executive one-pager
# fig, axes = plt.subplots(2, 2, figsize=(13, 9))
# fig.suptitle('Churn Model — Executive Summary', fontsize=14)
#
# Panel A: lift_decile_chart(y_te, proba_gb, ax=axes[0,0])
# Panel B: profit curve from threshold sweep
# Panel C: EV bar chart for three models
# Panel D: text-only KPI card
print("Uncomment and build the executive one-pager.")Common Pitfalls¶
Summary¶
Key takeaways
| Chart | Primary audience | Business question answered |
|---|---|---|
| Annotated confusion matrix | Any | Which types of errors are being made? |
| ROC + PR curves | Data scientists | How well does the model discriminate? |
| Lift decile chart | Marketing / ops | Which customers are worth targeting? |
| Threshold–profit dashboard | Decision-makers | What threshold maximises net value? |
| Feature importance | Data scientists + domain experts | Which inputs drive the model? |
| Multi-model comparison panel | Team leads / analysts | Which model should we deploy? |
Design rules: consistent colour palette, always show a baseline, annotate key values in plain language, include business translation in titles when presenting to non-technical audiences.
Next Up — Metrics Lab (Metric Dashboard)¶

You can now build all the key business charts. Next: assemble them into a live metric dashboard.¶
The next notebook — Metric Dashboard — brings together the confusion matrix, ROC/PR curves, lift chart, threshold-profit curve, and feature importance into a single interactive report. You will load a real dataset, train a model, and produce a self-contained HTML dashboard that a business team can explore without writing any code.
Dependencies: all chart-building functions from this notebook, plus the business EV metrics and custom scorer from business_metrics.ipynb.