Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Business Visualisation

Turning Model Results into Decisions

A model that no stakeholder understands does not get funded, deployed, or acted on. This notebook covers the visualisations that translate ML outputs into business language: confusion-matrix heatmaps, ROC and precision-recall curves, lift decile charts, threshold-profit dashboards, feature importance plots, and model comparison panels — built for both data science peers and non-technical audiences.

Why Visualisation Is the Last Mile

The last-mile problem: a churn model with lift of 4.6× at the top decile is a strong result — but only if the marketing team understands what "lift 4.6×" means. A single well-designed lift decile bar chart communicates the same result in thirty seconds to someone who has never heard the word "lift".

There are three distinct audiences for model visualisations, each needing a different chart style:

AudienceWhat they askRight chart
Data scientist / ML engineerHow well does the model discriminate?ROC curve, PR curve, calibration plot
Business analyst / ops teamWhere should we focus resources?Lift decile chart, gain curve, threshold-profit plot
Executive / stakeholderDid this model make money?ROI bar, model comparison panel, KPI delta card

Setup — Shared Data and Model

All visualisations in this notebook use the same churn classification model so that charts are directly comparable.

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import (
    confusion_matrix, classification_report,
    roc_curve, auc, precision_recall_curve, average_precision_score
)

np.random.seed(42)
X, y = make_classification(
    n_samples=3000, n_features=12, n_informative=7,
    weights=[0.82, 0.18], flip_y=0.04, random_state=42
)
feature_names = [
    'recency', 'frequency', 'monetary', 'tenure',
    'support_calls', 'last_login_days', 'plan_tier',
    'payment_delay', 'products_used', 'engagement',
    'nps_score', 'region_code'
]

X_tr, X_te, y_tr, y_te = train_test_split(
    X, y, test_size=0.3, random_state=0, stratify=y
)

# Fit three models for comparison
lr  = make_pipeline(StandardScaler(), LogisticRegression(C=1.0, random_state=0))
rf  = RandomForestClassifier(n_estimators=100, random_state=0, n_jobs=-1)
gb  = GradientBoostingClassifier(n_estimators=100, random_state=0)

for m in [lr, rf, gb]:
    m.fit(X_tr, y_tr)

proba_lr = lr.predict_proba(X_te)[:, 1]
proba_rf = rf.predict_proba(X_te)[:, 1]
proba_gb = gb.predict_proba(X_te)[:, 1]

print(f"Test set: {len(y_te)} samples, {y_te.sum()} positive ({y_te.mean()*100:.1f}% churn rate)")
print("Models trained: Logistic Regression, Random Forest, Gradient Boosting")

1. Annotated Confusion-Matrix Heatmap

The raw confusion matrix is opaque to non-technical readers. An annotated heatmap with percentages, colour intensity proportional to count, and clear outcome labels bridges that gap.

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

def plot_confusion_matrix(y_true, y_pred, model_name='Model',
                           class_names=('Stay', 'Churn'), ax=None):
    cm = confusion_matrix(y_true, y_pred)
    cm_norm = cm.astype(float) / cm.sum(axis=1, keepdims=True)

    if ax is None:
        fig, ax = plt.subplots(figsize=(5, 4))

    im = ax.imshow(cm_norm, cmap='Blues', vmin=0, vmax=1)
    plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)

    labels = [['True Neg\n(loyal customer\nleft alone)',
                'False Pos\n(unnecessary\noutreach)'],
               ['False Neg\n(churner\nmissed)',
                'True Pos\n(churner\ncontacted)']]

    for i in range(2):
        for j in range(2):
            color = 'white' if cm_norm[i, j] > 0.5 else 'black'
            ax.text(j, i, f"{cm[i,j]}\n({cm_norm[i,j]*100:.1f}%)\n{labels[i][j]}",
                    ha='center', va='center', fontsize=8, color=color)

    ax.set_xticks([0, 1]); ax.set_xticklabels(class_names)
    ax.set_yticks([0, 1]); ax.set_yticklabels(class_names, rotation=90, va='center')
    ax.set_xlabel('Predicted label')
    ax.set_ylabel('True label')
    ax.set_title(f'Confusion Matrix — {model_name}')
    return ax

fig, axes = plt.subplots(1, 3, figsize=(15, 4))
for ax, proba, name in zip(axes,
                            [proba_lr, proba_rf, proba_gb],
                            ['Logistic Reg.', 'Random Forest', 'Grad. Boosting']):
    plot_confusion_matrix(y_te, (proba >= 0.5).astype(int), model_name=name, ax=ax)

plt.suptitle('Confusion matrices at default threshold (0.5)', y=1.02, fontsize=12)
plt.tight_layout()
plt.show()

2. ROC Curve and Precision–Recall Curve

ROC curve — plots true positive rate vs false positive rate across all thresholds. Good for comparing discriminative power. AUC = 0.5 is random; AUC = 1.0 is perfect.

Precision–recall curve — plots precision vs recall. More informative than ROC on heavily imbalanced datasets, because it focuses on the positive class performance.

AUC-ROC=P(scorepos>scoreneg)AP=k(RkRk1)Pk\text{AUC-ROC} = P(\text{score}_{\text{pos}} > \text{score}_{\text{neg}}) \qquad \text{AP} = \sum_k (R_k - R_{k-1}) \cdot P_k
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc, precision_recall_curve, average_precision_score

models_info = [
    ('Logistic Reg.',   proba_lr, 'steelblue'),
    ('Random Forest',   proba_rf, 'tomato'),
    ('Grad. Boosting',  proba_gb, 'seagreen'),
]

fig, axes = plt.subplots(1, 2, figsize=(13, 5))

# --- ROC ---
for name, proba, color in models_info:
    fpr, tpr, thrs = roc_curve(y_te, proba)
    roc_auc = auc(fpr, tpr)
    axes[0].plot(fpr, tpr, color=color, linewidth=2,
                 label=f'{name} (AUC={roc_auc:.3f})')
    # mark threshold=0.5
    idx = np.argmin(np.abs(thrs - 0.5))
    axes[0].scatter(fpr[idx], tpr[idx], color=color, s=60, zorder=5)

axes[0].plot([0,1],[0,1],'k--',linewidth=1,label='Random (AUC=0.5)')
axes[0].set_xlabel('False Positive Rate')
axes[0].set_ylabel('True Positive Rate (Recall)')
axes[0].set_title('ROC Curves (dots = threshold 0.5)')
axes[0].legend(loc='lower right', fontsize=8)
axes[0].grid(True, alpha=0.3)

# --- Precision-Recall ---
baseline_pr = y_te.mean()
for name, proba, color in models_info:
    prec, rec, _ = precision_recall_curve(y_te, proba)
    ap = average_precision_score(y_te, proba)
    axes[1].plot(rec, prec, color=color, linewidth=2,
                 label=f'{name} (AP={ap:.3f})')

axes[1].axhline(baseline_pr, color='k', linestyle='--', linewidth=1,
                label=f'Random baseline ({baseline_pr:.2f})')
axes[1].set_xlabel('Recall')
axes[1].set_ylabel('Precision')
axes[1].set_title('Precision–Recall Curves')
axes[1].legend(loc='upper right', fontsize=8)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

3. Lift Decile Bar Chart

The lift decile chart divides the population into ten equal buckets ranked by predicted probability. Each bar shows how many times more positive cases fall in that bucket compared to random. Marketing and ops teams use this to decide which deciles are worth targeting.

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

def lift_decile_chart(y_true, y_score, model_name='Model', ax=None, color='steelblue'):
    n = len(y_true)
    order = np.argsort(y_score)[::-1]
    y_sorted = y_true[order]
    base_rate = y_true.mean()

    decile_size = n // 10
    lifts, gains = [], []
    for d in range(10):
        bucket = y_sorted[d * decile_size : (d+1) * decile_size]
        rate   = bucket.mean()
        lifts.append(rate / base_rate if base_rate > 0 else 0)
        gains.append(bucket.sum() / y_true.sum() * 100)

    if ax is None:
        fig, ax = plt.subplots(figsize=(8, 4))

    xs = np.arange(1, 11)
    bars = ax.bar(xs, lifts, color=[color if l > 1 else '#cccccc' for l in lifts],
                  edgecolor='white', linewidth=0.8)
    ax.axhline(1.0, color='k', linestyle='--', linewidth=1, label='Baseline (lift=1)')

    for bar, lift, gain in zip(bars, lifts, gains):
        ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.03,
                f'{lift:.1f}x\n({gain:.0f}%)',
                ha='center', va='bottom', fontsize=7)

    ax.set_xticks(xs)
    ax.set_xticklabels([f'D{d}' for d in range(1, 11)])
    ax.set_xlabel('Decile (D1=highest score)')
    ax.set_ylabel('Lift')
    ax.set_title(f'Lift Decile Chart — {model_name}\n(bar labels: lift × and % of all churners in decile)')
    ax.legend()
    return ax

fig, axes = plt.subplots(1, 3, figsize=(16, 4))
for ax, proba, name, col in zip(
        axes,
        [proba_lr, proba_rf, proba_gb],
        ['Logistic Reg.', 'Random Forest', 'Grad. Boosting'],
        ['steelblue', 'tomato', 'seagreen']):
    lift_decile_chart(y_te, proba, model_name=name, ax=ax, color=col)

plt.tight_layout()
plt.show()

4. Threshold–Profit Dashboard

For the decision-maker who needs to pick a deployment threshold, a four-panel dashboard — profit curve, precision/recall vs threshold, confusion matrix at the chosen threshold, and the selected KPIs — provides everything in one view.

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

# Business parameters
V_TP, C_FP, C_FN = 80.0, 10.0, 120.0

thresholds = np.linspace(0.01, 0.99, 300)
ev_curve, prec_curve, rec_curve = [], [], []

for t in thresholds:
    y_pred_t = (proba_gb >= t).astype(int)
    tn_t, fp_t, fn_t, tp_t = confusion_matrix(y_te, y_pred_t, labels=[0,1]).ravel()
    ev_curve.append(tp_t * V_TP - fp_t * C_FP - fn_t * C_FN)
    prec_t = tp_t / (tp_t + fp_t) if (tp_t + fp_t) > 0 else 0
    rec_t  = tp_t / (tp_t + fn_t) if (tp_t + fn_t) > 0 else 0
    prec_curve.append(prec_t)
    rec_curve.append(rec_t)

ev_curve   = np.array(ev_curve)
best_idx   = np.argmax(ev_curve)
best_t     = thresholds[best_idx]

fig, axes = plt.subplots(2, 2, figsize=(13, 9))
fig.suptitle(f'Threshold Dashboard — Gradient Boosting (best t={best_t:.2f})', fontsize=13)

# Panel A: Profit curve
ax = axes[0, 0]
ax.plot(thresholds, ev_curve, color='seagreen', linewidth=2)
ax.axvline(best_t, color='green', linestyle='--', label=f'Best t={best_t:.2f}')
ax.axvline(0.5,    color='red',   linestyle=':',  label='Default t=0.5')
ax.axhline(0, color='k', linewidth=0.7)
ax.fill_between(thresholds, 0, ev_curve, where=(ev_curve > 0), alpha=0.12, color='seagreen')
ax.set_xlabel('Threshold')
ax.set_ylabel('Expected Value ($)')
ax.set_title('A — Profit Curve')
ax.legend(fontsize=8)
ax.grid(True, alpha=0.3)

# Panel B: Precision / Recall vs threshold
ax = axes[0, 1]
ax.plot(thresholds, prec_curve, 'b-', linewidth=1.8, label='Precision')
ax.plot(thresholds, rec_curve,  'r-', linewidth=1.8, label='Recall')
ax.axvline(best_t, color='green', linestyle='--', label=f'Best t={best_t:.2f}')
ax.set_xlabel('Threshold')
ax.set_ylabel('Score')
ax.set_title('B — Precision & Recall')
ax.legend(fontsize=8)
ax.grid(True, alpha=0.3)

# Panel C: Confusion matrix at best threshold
ax = axes[1, 0]
y_best = (proba_gb >= best_t).astype(int)
cm = confusion_matrix(y_te, y_best)
cm_norm = cm / cm.sum(axis=1, keepdims=True)
im = ax.imshow(cm_norm, cmap='Blues', vmin=0, vmax=1)
plt.colorbar(im, ax=ax, fraction=0.046)
for i in range(2):
    for j in range(2):
        c = 'white' if cm_norm[i, j] > 0.55 else 'black'
        ax.text(j, i, f'{cm[i,j]}\n({cm_norm[i,j]*100:.0f}%)',
                ha='center', va='center', fontsize=10, color=c)
ax.set_xticks([0,1]); ax.set_xticklabels(['Pred Stay','Pred Churn'])
ax.set_yticks([0,1]); ax.set_yticklabels(['True Stay','True Churn'], rotation=90, va='center')
ax.set_title(f'C — Confusion Matrix at t={best_t:.2f}')

# Panel D: KPI summary bar
ax = axes[1, 1]
tn_b, fp_b, fn_b, tp_b = cm.ravel()
ev_best    = tp_b * V_TP - fp_b * C_FP - fn_b * C_FN
ev_default = ev_curve[np.argmin(np.abs(thresholds - 0.5))]

kpi_names  = ['EV @ optimal\nthreshold', 'EV @ default\n0.5', 'TP caught', 'FN missed']
kpi_values = [ev_best, ev_default, tp_b, fn_b]
colors_kpi = ['seagreen', 'tomato', 'steelblue', 'orange']

bars = ax.barh(kpi_names, kpi_values, color=colors_kpi)
for bar, val in zip(bars, kpi_values):
    ax.text(bar.get_width() + abs(max(kpi_values)) * 0.01,
            bar.get_y() + bar.get_height() / 2,
            f'${val:,.0f}' if 'EV' in kpi_names[kpi_values.index(val)] else f'{int(val)}',
            va='center', fontsize=9)
ax.set_title('D — KPI Summary')
ax.axvline(0, color='k', linewidth=0.7)
ax.grid(True, alpha=0.2, axis='x')

plt.tight_layout()
plt.show()

5. Feature Importance Chart

Feature importance charts answer “which inputs does the model rely on most?” — a question every business stakeholder asks. For tree-based models, permutation importance is more reliable than the built-in impurity-based importance, especially when features have different cardinalities.

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.inspection import permutation_importance

# Permutation importance on test set
result = permutation_importance(
    rf, X_te, y_te, n_repeats=10, random_state=42,
    scoring='roc_auc', n_jobs=-1
)

imp_mean = result.importances_mean
imp_std  = result.importances_std
order    = np.argsort(imp_mean)  # ascending for horizontal bar

fig, ax = plt.subplots(figsize=(8, 6))
colors = ['tomato' if imp_mean[i] > 0 else '#cccccc' for i in order]
ax.barh(
    [feature_names[i] for i in order],
    imp_mean[order],
    xerr=imp_std[order],
    color=colors, edgecolor='white', linewidth=0.5,
    capsize=4
)
ax.axvline(0, color='k', linewidth=0.8)
ax.set_xlabel('Permutation importance\n(mean decrease in AUC when feature is shuffled)')
ax.set_title('Feature Importance — Random Forest\n(permutation importance, test set, 10 repeats)')
ax.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

print("Top 3 features by permutation importance:")
for rank, i in enumerate(order[::-1][:3], 1):
    print(f"  {rank}. {feature_names[i]:20s}  {imp_mean[i]:.4f} ± {imp_std[i]:.4f}")

6. Multi-Model Comparison Panel

When presenting model selection results to a team, a single comparison panel with consistent metrics across all candidates is clearer than separate figures.

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import (roc_auc_score, average_precision_score,
                              f1_score, confusion_matrix)

model_labels = ['Logistic\nReg.', 'Random\nForest', 'Grad.\nBoosting']
probas       = [proba_lr, proba_rf, proba_gb]
colors       = ['steelblue', 'tomato', 'seagreen']

metrics = {}
for name, proba in zip(model_labels, probas):
    y_pred = (proba >= 0.5).astype(int)
    tn, fp, fn, tp = confusion_matrix(y_te, y_pred, labels=[0,1]).ravel()
    ev = tp * V_TP - fp * C_FP - fn * C_FN
    metrics[name] = {
        'AUC-ROC': roc_auc_score(y_te, proba),
        'Avg Prec.': average_precision_score(y_te, proba),
        'F1 (churn)': f1_score(y_te, y_pred),
        'Biz EV ($k)': ev / 1000,
    }

metric_names = list(next(iter(metrics.values())).keys())
x = np.arange(len(metric_names))
width = 0.25

fig, ax = plt.subplots(figsize=(11, 5))
for i, (name, vals) in enumerate(metrics.items()):
    bar_vals = [vals[m] for m in metric_names]
    offset = (i - 1) * width
    bars = ax.bar(x + offset, bar_vals, width, label=name,
                  color=colors[i], edgecolor='white')
    for bar, v in zip(bars, bar_vals):
        ax.text(bar.get_x() + bar.get_width()/2,
                bar.get_height() + 0.005,
                f'{v:.2f}' if abs(v) < 10 else f'${v:.1f}k',
                ha='center', va='bottom', fontsize=7)

ax.set_xticks(x)
ax.set_xticklabels(metric_names)
ax.set_ylabel('Score / Value')
ax.set_title('Model Comparison Panel — Churn Classification')
ax.legend()
ax.grid(True, alpha=0.2, axis='y')
ax.axhline(0, color='k', linewidth=0.7)
plt.tight_layout()
plt.show()

print("\nModel comparison summary:")
for name, vals in metrics.items():
    print(f"  {name.replace(chr(10),' '):20s}  " +
          "  ".join(f"{k}={v:.3f}" for k,v in vals.items()))

Visual Design Principles for Business Charts

Colour conventions
Annotation rules
Audience framing
ColourUse forAvoid
GreenPositive outcomes, profit, improvementErrors, losses
Red / tomatoErrors, cost, warningsSuccess metrics
BlueNeutral comparisons, reference linesStrong warnings
GreyBaselines, inactive elementsPrimary data

Use a consistent palette across all panels in a report. Changing colours between charts forces readers to re-learn the mapping.

Try It in the Browser

Compute a lift decile table in pure Python — see which deciles are worth targeting.

Guided Practice

A marketing manager asks: "If I can only contact 20% of customers, which ones should I pick?" Which chart answers this most directly?

ROC curveThe ROC curve shows overall discriminative power across all thresholds but does not directly answer a budget-constrained targeting question.
Lift decile chartCorrect. The lift decile chart directly shows how many more churners you capture in the top 20% (deciles D1–D2) versus random selection — the exact question being asked.
Confusion matrixThe confusion matrix shows outcome counts at a fixed threshold, not across a targeting budget spectrum.
Feature importance chartFeature importance explains model inputs, not where to focus targeting efforts.

Why is the precision–recall curve more informative than the ROC curve on a heavily imbalanced dataset?

Because PR curves are always higher than ROC curvesThat is not generally true. The comparison is about which curve is more informative, not which is numerically higher.
Because the PR curve focuses on the positive class, so a large true-negative pool cannot inflate the apparent performanceCorrect. On imbalanced data, a model can achieve high AUC-ROC by correctly handling the majority class, even if it does poorly on the minority class. The PR curve is not affected by TN counts.
Because precision and recall are always more important than TPR and FPRThe choice depends on the business context. On balanced data, ROC is equally informative.
Because the PR curve uses fewer hyperparametersHyperparameters are unrelated to which evaluation curve to plot.

You build a model comparison bar chart where Model A has AUC=0.83 and Model B has AUC=0.81, but Model B has higher business EV. Which model should you recommend?

Model A — it has the higher AUCAUC measures discriminative rank ordering but does not account for the asymmetric cost structure. The business EV metric directly captures what matters.
Model B — it generates more business value, which is the stated objectiveCorrect. When the business objective is net value (profit, cost saved), the model that maximises that objective should be deployed, even if another model wins on AUC.
Neither — always use F1-Score to compare modelsF1-Score treats FP and FN costs symmetrically. This scenario has asymmetric costs, so F1 is not the right deciding metric.
Run more experiments until the AUC and EV agreeAUC and EV measure different things and may legitimately disagree. Additional experiments will not resolve this unless they test the business metric directly.

Permutation importance is preferred over built-in tree impurity importance because:

It is always faster to computePermutation importance requires multiple model evaluations; it is typically slower than reading the built-in impurity scores.
It measures importance on the actual evaluation metric on held-out data, avoiding bias toward high-cardinality featuresCorrect. Impurity-based importance is biased toward features with many unique values and is computed on training data. Permutation importance uses the test set and the actual scoring metric.
It works only on logistic regressionPermutation importance is model-agnostic — it works on any fitted estimator.
It uses gradients to explain predictionsPermutation importance shuffles feature values and measures the drop in metric — no gradients are involved. Gradient-based methods include SHAP and integrated gradients.

Exercises

Exercise 1 — Annotate the ROC curve with threshold markers

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc

# proba_gb and y_te are available from the setup cell

# TODO:
# 1. Compute ROC curve for the Gradient Boosting model
# fpr, tpr, thrs = roc_curve(y_te, proba_gb)
# roc_auc = auc(fpr, tpr)
#
# 2. Plot the ROC curve
# 3. Mark three threshold points on the curve: t=0.1, t=0.3, t=0.5
#    Use ax.scatter and ax.annotate to label each point with its threshold value
# 4. Add a text box showing AUC in the lower-right corner
# 5. Add the random baseline line

print("Uncomment and complete the annotated ROC curve.")

Exercise 2 — Grouped model comparison with business EV

Using the three models (lr, rf, gb) and their test probabilities:

  1. Compute AUC-ROC, Average Precision, F1-Score, and Business EV for each model.

  2. Rank models by each metric — does the ranking change?

  3. Build a grouped bar chart (four metric groups, three bars per group) with value labels.

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import roc_auc_score, average_precision_score, f1_score, confusion_matrix

# V_TP, C_FP, C_FN are defined in the setup/threshold cells

# TODO:
# model_labels = ['Logistic Reg.', 'Random Forest', 'Grad. Boosting']
# probas = [proba_lr, proba_rf, proba_gb]
#
# for name, proba in zip(model_labels, probas):
#     y_pred = (proba >= 0.5).astype(int)
#     tn, fp, fn, tp = confusion_matrix(y_te, y_pred, labels=[0,1]).ravel()
#     ev = tp * 80 - fp * 10 - fn * 120
#     print(f"{name}: AUC={roc_auc_score(y_te, proba):.3f}  AP={average_precision_score(y_te, proba):.3f}"
#           f"  F1={f1_score(y_te, y_pred):.3f}  EV=${ev:,.0f}")

print("Uncomment and complete the model comparison.")

Exercise 3 — Executive one-pager

Produce a single 2×2 figure that a non-technical manager can read in 60 seconds:

  • Panel A: Lift decile chart for the best model (highest EV from Exercise 2).

  • Panel B: Profit curve (EV vs threshold) with the optimal point labelled in dollars.

  • Panel C: Model comparison bar chart (just Business EV, three models).

  • Panel D: Plain-text KPI card: TP, FN, EV, and lift at top 20% — use ax.text() only, no bars.

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

# TODO: build the 2×2 executive one-pager
# fig, axes = plt.subplots(2, 2, figsize=(13, 9))
# fig.suptitle('Churn Model — Executive Summary', fontsize=14)
#
# Panel A: lift_decile_chart(y_te, proba_gb, ax=axes[0,0])
# Panel B: profit curve from threshold sweep
# Panel C: EV bar chart for three models
# Panel D: text-only KPI card

print("Uncomment and build the executive one-pager.")

Common Pitfalls

Summary

Key takeaways
ChartPrimary audienceBusiness question answered
Annotated confusion matrixAnyWhich types of errors are being made?
ROC + PR curvesData scientistsHow well does the model discriminate?
Lift decile chartMarketing / opsWhich customers are worth targeting?
Threshold–profit dashboardDecision-makersWhat threshold maximises net value?
Feature importanceData scientists + domain expertsWhich inputs drive the model?
Multi-model comparison panelTeam leads / analystsWhich model should we deploy?

Design rules: consistent colour palette, always show a baseline, annotate key values in plain language, include business translation in titles when presenting to non-technical audiences.

Next Up — Metrics Lab (Metric Dashboard)

You can now build all the key business charts. Next: assemble them into a live metric dashboard.

The next notebook — Metric Dashboard — brings together the confusion matrix, ROC/PR curves, lift chart, threshold-profit curve, and feature importance into a single interactive report. You will load a real dataset, train a model, and produce a self-contained HTML dashboard that a business team can explore without writing any code.

Dependencies: all chart-building functions from this notebook, plus the business EV metrics and custom scorer from business_metrics.ipynb.