Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Business-Aware Metrics

Optimising What the Business Actually Cares About

Cross-validation R² and accuracy tell you how the model performs on held-out data. They do not tell you how much money it saves, which misclassifications are expensive, or whether deploying it is worth the cost. This notebook bridges the gap: from statistical metrics to business KPIs, custom scorers, cost-sensitive thresholds, lift curves, and expected-value optimisation.

Why Standard ML Metrics Are Not Enough

The accuracy illusion: a churn model trained on a dataset where 95% of customers stay achieves 95% accuracy by predicting "no churn" for every customer. The model is statistically impressive and commercially useless — it flags nobody for intervention. The business loses every at-risk customer it could have saved.

Three common disconnects between ML metrics and business outcomes:

ML metric saysBusiness reality
98% accuracy on fraud detectionAll transactions approved; 2% are fraud, each costing $500
Low RMSE on demand forecastErrors skewed toward under-stocking high-margin SKUs; stockouts hurt more than waste
High AUC on loan default modelThreshold set at 0.5 rejects too many profitable customers; moving to 0.3 increases revenue

The Cost Matrix — Putting Dollar Values on Errors

For a binary classifier, there are four outcomes. In a business context each has a different value:

Expected Value=TP×VTP+TN×VTNFP×CFPFN×CFN\text{Expected Value} = \color{#2ca02c}{TP \times V_{TP}} + \color{#1f77b4}{TN \times V_{TN}} - \color{#ff7f0e}{FP \times C_{FP}} - \color{#d62728}{FN \times C_{FN}}

where VTP\color{#2ca02c}{V_{TP}} = value of a correct positive action, CFP\color{#ff7f0e}{C_{FP}} = cost of a false alarm, CFN\color{#d62728}{C_{FN}} = cost of a missed positive.

OutcomeNotationFraud detection exampleChurn intervention example
True PositiveTPFraud caught; transaction blockedAt-risk customer contacted; stays
True NegativeTNLegit transaction approvedHappy customer left alone
False PositiveFPLegit transaction blocked; customer annoyedUnnecessary discount sent
False NegativeFNFraud missed; money lostChurner not contacted; revenue lost
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report

# Simulate a churn dataset: 15% churn rate
np.random.seed(42)
X, y = make_classification(n_samples=2000, n_features=10, weights=[0.85, 0.15],
                            flip_y=0.05, random_state=42)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.3, random_state=0, stratify=y)

model = make_pipeline(StandardScaler(), LogisticRegression(C=1.0, random_state=0))
model.fit(X_tr, y_tr)

y_pred_default = model.predict(X_te)  # default threshold 0.5
y_proba = model.predict_proba(X_te)[:, 1]

cm = confusion_matrix(y_te, y_pred_default)
tn, fp, fn, tp = cm.ravel()

# Business parameters: churn scenario
v_tp =  80.0   # saved customer brings $80 net value
v_tn =   0.0   # leaving a happy customer alone costs nothing
c_fp =  10.0   # wasted discount on a non-churner costs $10
c_fn = 120.0   # missed churner costs $120 (lost revenue)

ev_default = tp * v_tp + tn * v_tn - fp * c_fp - fn * c_fn

print("=== Default threshold (0.5) ===")
print(f"  TP={tp}  FP={fp}  FN={fn}  TN={tn}")
print(f"  Accuracy: {(tp+tn)/(tp+tn+fp+fn):.3f}")
print(f"  Expected Value: ${ev_default:,.0f}")
print()
print(classification_report(y_te, y_pred_default, target_names=['stay', 'churn']))

Threshold Optimisation — Finding the Profit-Maximising Cutoff

The default threshold of 0.5 minimises Brier score in probability terms but is rarely optimal for business objectives. By scanning thresholds and computing expected value at each point, we find the cutoff that maximises net profit.

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

thresholds = np.linspace(0.01, 0.99, 200)
ev_curve   = []
prec_curve = []
rec_curve  = []

for t in thresholds:
    y_pred_t = (y_proba >= t).astype(int)
    tn_t, fp_t, fn_t, tp_t = confusion_matrix(y_te, y_pred_t, labels=[0,1]).ravel()
    ev_t = tp_t * v_tp + tn_t * v_tn - fp_t * c_fp - fn_t * c_fn
    ev_curve.append(ev_t)
    prec_t = tp_t / (tp_t + fp_t) if (tp_t + fp_t) > 0 else 0
    rec_t  = tp_t / (tp_t + fn_t) if (tp_t + fn_t) > 0 else 0
    prec_curve.append(prec_t)
    rec_curve.append(rec_t)

ev_curve   = np.array(ev_curve)
best_idx   = np.argmax(ev_curve)
best_t     = thresholds[best_idx]
best_ev    = ev_curve[best_idx]

print(f"Profit-maximising threshold: {best_t:.3f}")
print(f"Expected value at best threshold: ${best_ev:,.0f}")
print(f"Expected value at default (0.5):  ${ev_curve[np.argmin(np.abs(thresholds - 0.5))]:,.0f}")

fig, axes = plt.subplots(1, 2, figsize=(13, 4))

# Left: EV curve
axes[0].plot(thresholds, ev_curve, color='steelblue', linewidth=2)
axes[0].axvline(best_t, color='green', linestyle='--', label=f'Best t={best_t:.2f}')
axes[0].axvline(0.5,    color='red',   linestyle=':',  label='Default t=0.5')
axes[0].set_xlabel('Classification threshold')
axes[0].set_ylabel('Expected business value ($)')
axes[0].set_title('Expected Value vs Threshold')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Right: Precision-Recall vs threshold
axes[1].plot(thresholds, prec_curve, 'b-', linewidth=1.5, label='Precision')
axes[1].plot(thresholds, rec_curve,  'r-', linewidth=1.5, label='Recall')
axes[1].axvline(best_t, color='green', linestyle='--', label=f'Best EV threshold')
axes[1].axvline(0.5,    color='gray',  linestyle=':',  label='Default 0.5')
axes[1].set_xlabel('Classification threshold')
axes[1].set_ylabel('Score')
axes[1].set_title('Precision & Recall vs Threshold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Cumulative Gain and Lift Curves

A lift curve answers: “If I contact the top k%k\% of customers ranked by predicted probability, what fraction of all churners do I capture?”

Lift = fraction captured by model / fraction expected from random targeting:

Lift at k%=Churners captured in top-k% of model scoresk%×Total churners\text{Lift at } k\% = \frac{\text{Churners captured in top-}k\%\text{ of model scores}}{k\% \times \text{Total churners}}

A lift of 3 at the top 20% means the model is 3× more efficient than random — you reach the same number of true churners by contacting 20% of customers instead of 60%.

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

# Sort by descending predicted probability
order      = np.argsort(y_proba)[::-1]
y_sorted   = y_te[order]
n_pos      = y_te.sum()          # total positives
n_total    = len(y_te)

pct_contacted = np.arange(1, n_total + 1) / n_total          # fraction of population contacted
cum_gain      = np.cumsum(y_sorted) / n_pos                   # fraction of all positives captured
random_line   = pct_contacted                                 # baseline: random ordering

# Lift = gain / random
lift = cum_gain / (pct_contacted + 1e-12)

fig, axes = plt.subplots(1, 2, figsize=(13, 5))

# Cumulative Gain chart
axes[0].plot(pct_contacted * 100, cum_gain * 100,
             color='steelblue', linewidth=2, label='Model')
axes[0].plot([0, 100], [0, 100], 'k--', linewidth=1, label='Random baseline')
axes[0].fill_between(pct_contacted * 100, random_line * 100, cum_gain * 100,
                     alpha=0.15, color='steelblue', label='Lift area')
axes[0].set_xlabel('% population contacted (ranked by score)')
axes[0].set_ylabel('% churners captured')
axes[0].set_title('Cumulative Gain Curve')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Lift curve
axes[1].plot(pct_contacted * 100, lift, color='tomato', linewidth=2)
axes[1].axhline(1.0, color='k', linestyle='--', linewidth=1, label='No lift (random)')
axes[1].set_xlabel('% population contacted (ranked by score)')
axes[1].set_ylabel('Lift')
axes[1].set_title('Lift Curve')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Annotate lift at 20%
idx_20 = int(0.20 * n_total)
lift_20 = lift[idx_20]
axes[1].annotate(f'Lift={lift_20:.1f}x\n@ top 20%',
                 xy=(20, lift_20), xytext=(35, lift_20 + 0.3),
                 arrowprops=dict(arrowstyle='->', color='black'),
                 fontsize=9)

plt.tight_layout()
plt.show()

print(f"Lift at top 10%: {lift[int(0.10*n_total)]:.2f}x")
print(f"Lift at top 20%: {lift[int(0.20*n_total)]:.2f}x")
print(f"Lift at top 30%: {lift[int(0.30*n_total)]:.2f}x")

Custom sklearn Scorers

sklearn’s scoring= parameter accepts any callable with the signature scorer(estimator, X, y) → float. You can wrap any business metric — profit, weighted cost, uplift — into a scorer and pass it to GridSearchCV, RandomizedSearchCV, or cross_val_score.

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import make_scorer, confusion_matrix
from sklearn.model_selection import GridSearchCV, cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

# Business parameters (churn scenario)
V_TP, V_TN, C_FP, C_FN = 80.0, 0.0, 10.0, 120.0

def business_ev_score(y_true, y_pred):
    """Expected business value per sample (so CV folds are comparable)."""
    tn, fp, fn, tp = confusion_matrix(y_true, y_pred, labels=[0,1]).ravel()
    ev = tp * V_TP + tn * V_TN - fp * C_FP - fn * C_FN
    return ev / len(y_true)

business_scorer = make_scorer(business_ev_score)

pipe = make_pipeline(StandardScaler(), LogisticRegression(random_state=0))
param_grid = {'logisticregression__C': np.logspace(-3, 2, 12)}

# Compare: tuning on AUC vs tuning on business EV
gs_auc = GridSearchCV(pipe, param_grid, cv=5, scoring='roc_auc', n_jobs=-1)
gs_auc.fit(X_tr, y_tr)

gs_ev = GridSearchCV(pipe, param_grid, cv=5, scoring=business_scorer, n_jobs=-1)
gs_ev.fit(X_tr, y_tr)

# Evaluate both on test set using the business metric
def ev_on_test(estimator):
    return business_ev_score(y_te, estimator.predict(X_te)) * len(y_te)

print(f"Best C (AUC-tuned): {gs_auc.best_params_['logisticregression__C']:.4g}")
print(f"Best C (EV-tuned):  {gs_ev.best_params_['logisticregression__C']:.4g}")
print()
print(f"Test EV (AUC-tuned model): ${ev_on_test(gs_auc):,.0f}")
print(f"Test EV (EV-tuned model):  ${ev_on_test(gs_ev):,.0f}")

# Plot C vs CV score for both
c_vals_auc = [p['logisticregression__C'] for p in gs_auc.cv_results_['params']]
c_vals_ev  = [p['logisticregression__C'] for p in gs_ev.cv_results_['params']]

fig, axes = plt.subplots(1, 2, figsize=(13, 4))
axes[0].semilogx(c_vals_auc, gs_auc.cv_results_['mean_test_score'], 'b-o', linewidth=2)
axes[0].axvline(gs_auc.best_params_['logisticregression__C'], color='green',
                linestyle='--', label=f"Best C={gs_auc.best_params_['logisticregression__C']:.3g}")
axes[0].set_xlabel('C (inverse regularisation)')
axes[0].set_ylabel('CV AUC')
axes[0].set_title('Tuning on AUC')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

axes[1].semilogx(c_vals_ev, gs_ev.cv_results_['mean_test_score'], 'r-o', linewidth=2)
axes[1].axvline(gs_ev.best_params_['logisticregression__C'], color='green',
                linestyle='--', label=f"Best C={gs_ev.best_params_['logisticregression__C']:.3g}")
axes[1].set_xlabel('C (inverse regularisation)')
axes[1].set_ylabel('CV EV per sample ($)')
axes[1].set_title('Tuning on Business EV')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Regression + Business Value — When R² Is the Wrong Target

In regression settings, the business impact of a prediction error is often asymmetric: under-predicting demand for a high-margin product costs more than over-predicting it, because stockouts lose sales while overstock only incurs holding cost.

A profit-weighted MSE captures this:

Profit loss=i{Cstockout(y^iyi)if y^i<yi (under-forecast)Coverstock(y^iyi)if y^i>yi (over-forecast)\text{Profit loss} = \sum_{i} \begin{cases} \color{#d62728}{C_{\text{stockout}}} \cdot (\hat{y}_i - y_i) & \text{if } \hat{y}_i < y_i \text{ (under-forecast)}\\ \color{#ff7f0e}{C_{\text{overstock}}} \cdot (\hat{y}_i - y_i) & \text{if } \hat{y}_i > y_i \text{ (over-forecast)} \end{cases}
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge, Lasso
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Business parameters for demand forecasting
COST_OVERSTOCK  = 1.0   # $1 per unit over-forecast
COST_STOCKOUT   = 5.0   # $5 per unit under-forecast (lost margin)

def asymmetric_cost(y_true, y_pred, c_over=COST_OVERSTOCK, c_stock=COST_STOCKOUT):
    """Total asymmetric cost: stockout errors cost more than overstock errors."""
    err = y_pred - y_true
    cost = np.where(err > 0, c_over * err, c_stock * (-err))
    return cost.sum()

X, y = load_diabetes(return_X_y=True)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.25, random_state=0)

models = {
    'Ridge (α=0.1)':  make_pipeline(StandardScaler(), Ridge(alpha=0.1)),
    'Ridge (α=10)':   make_pipeline(StandardScaler(), Ridge(alpha=10)),
    'Lasso (α=1)':    make_pipeline(StandardScaler(), Lasso(alpha=1.0, max_iter=5000)),
    'Lasso (α=10)':   make_pipeline(StandardScaler(), Lasso(alpha=10.0, max_iter=5000)),
}

results = []
for name, m in models.items():
    m.fit(X_tr, y_tr)
    pred = m.predict(X_te)
    r2   = r2_score(y_te, pred)
    rmse = mean_squared_error(y_te, pred, squared=False)
    cost = asymmetric_cost(y_te, pred)
    results.append((name, r2, rmse, cost))
    print(f"{name:20s}  R²={r2:.4f}  RMSE={rmse:.2f}  Asymmetric cost=${cost:,.0f}")

# Bar chart: rank by R² vs rank by asymmetric cost
names  = [r[0] for r in results]
r2s    = [r[1] for r in results]
costs  = [r[3] for r in results]

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
colors = ['steelblue' if v == max(r2s) else '#aec6cf' for v in r2s]
axes[0].barh(names, r2s, color=colors)
axes[0].set_xlabel('Test R²')
axes[0].set_title('Ranked by R²')
axes[0].invert_xaxis()

colors2 = ['tomato' if v == min(costs) else '#f4a9a8' for v in costs]
axes[1].barh(names, costs, color=colors2)
axes[1].set_xlabel('Asymmetric cost ($, lower = better)')
axes[1].set_title('Ranked by Business Cost')

plt.suptitle('R² winner vs Business Cost winner — not always the same model', y=1.02)
plt.tight_layout()
plt.show()

ROI and Incremental Revenue Framing

Before presenting a model to stakeholders, translate performance into the language they care about:

MetricFormulaExample
Incremental revenueTP × revenue per conversion200 churners saved × $600 LTV = $120k
Cost avoidedFN_baseline − FN_model × cost per FN(150 − 40) × $120 = $13.2k saved
Net ROI(Revenue + Cost avoided − Model cost) / Model cost(120k + 13.2k − 20k) / 20k = 566%
Precision in $$“Of every $100 we spend targeting, $X returns”High-precision model → lower wasted spend
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

# Simulate incremental revenue at different targeting budgets
# Budget = % of population contacted in descending score order
budget_pcts = np.linspace(0.01, 1.0, 200)
campaign_cost_per_contact = 10.0   # $10 per targeted customer
revenue_per_retained      = 600.0  # $600 LTV recovered per true churn prevented

n_total_test = len(y_te)
order_test   = np.argsort(y_proba)[::-1]
y_sorted_roi = y_te[order_test]

ev_roi = []
for pct in budget_pcts:
    k = max(1, int(pct * n_total_test))
    tp_k    = y_sorted_roi[:k].sum()          # churners captured
    revenue = tp_k * revenue_per_retained
    spend   = k * campaign_cost_per_contact
    ev_roi.append(revenue - spend)

ev_roi = np.array(ev_roi)
best_pct_idx = np.argmax(ev_roi)
best_pct     = budget_pcts[best_pct_idx]

# Baseline: random targeting (no model)
ev_random = []
base_rate = y_te.mean()
for pct in budget_pcts:
    k = max(1, int(pct * n_total_test))
    tp_k    = k * base_rate                 # expected TP from random
    revenue = tp_k * revenue_per_retained
    spend   = k * campaign_cost_per_contact
    ev_random.append(revenue - spend)

fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(budget_pcts * 100, ev_roi,    'steelblue', linewidth=2.5, label='Model-targeted campaign')
ax.plot(budget_pcts * 100, ev_random, 'r--',       linewidth=1.5, label='Random targeting (baseline)')
ax.axvline(best_pct * 100, color='green', linestyle=':', linewidth=1.5,
           label=f'Optimal budget: top {best_pct*100:.0f}%')
ax.axhline(0, color='k', linewidth=0.8)
ax.fill_between(budget_pcts * 100, ev_random, ev_roi, alpha=0.12, color='steelblue',
                label='Incremental value of model')
ax.set_xlabel('% of customers contacted (ranked by churn score)')
ax.set_ylabel('Net campaign profit ($)')
ax.set_title('ROI Curve: Model vs Random Targeting')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

model_best = ev_roi[best_pct_idx]
random_at_same = ev_random[best_pct_idx]
print(f"Optimal budget: contact top {best_pct*100:.0f}% (n={int(best_pct*n_total_test)})")
print(f"Model net profit at optimal budget: ${model_best:,.0f}")
print(f"Random net profit at same budget:   ${random_at_same:,.0f}")
print(f"Incremental value of model:         ${model_best - random_at_same:,.0f}")

Try It in the Browser

Compute expected business value from a confusion matrix — change the cost parameters and see how the optimal decision shifts.

Guided Practice

A fraud model achieves 99% accuracy. The fraud rate is 1%. What does this tell you about the model's usefulness?

The model is excellent — 99% accuracy is very highA model that predicts "no fraud" for every transaction achieves 99% accuracy on a dataset with 1% fraud. It catches zero fraud.
Very little — a trivial model that never predicts fraud also achieves 99% accuracyCorrect. On highly imbalanced data, accuracy is misleading. Use precision/recall, AUC, or expected cost instead.
The model is slightly below ideal — it should aim for 100%100% accuracy would simply mean everything is predicted as the majority class. That is not a useful model.
Accuracy is always the right metric for fraud detectionFraud detection needs to account for the asymmetric cost of missing fraud (FN) vs flagging a legitimate transaction (FP).

In the expected value formula EV = TP × V_TP − FP × C_FP − FN × C_FN, a churn model has C_FN = $120 and C_FP = $10. What does this imply about the optimal threshold?

Use the default threshold of 0.5The default threshold is not calibrated to asymmetric costs. A lower threshold will favour recall (catching more churners), which is expensive to miss.
Lower the threshold below 0.5 to catch more churners, because missing them is 12× more costly than a false alarmCorrect. When C_FN >> C_FP, you should accept more false positives to reduce false negatives. This means lowering the decision threshold.
Raise the threshold to improve precisionRaising the threshold reduces false positives but increases false negatives — expensive when C_FN is high.
Maximise F1-Score, which balances precision and recall equallyF1-Score treats FP and FN as equally costly. The business scenario here has strongly asymmetric costs, so F1 is the wrong objective.

A lift of 4 at the top 10% means:

The model is 4% more accurate than randomLift is a multiplier, not a percentage difference. It compares the fraction captured by the model to the fraction expected from random selection.
Contacting the model's top-scored 10% of customers captures 4× more true positives than contacting a random 10%Correct. Lift = fraction of positives captured / fraction of population contacted. Lift = 4 at 10% means the model is 4× more efficient than random targeting at that budget level.
The model achieves 40% recallLift 4 at 10% means the model captures 40% of positives in its top 10% (if the base rate is 10%). But lift is defined as a ratio, not an absolute recall value.
The AUC of the model is 0.4Lift curves and AUC are related but different. AUC = 0.4 would mean the model is worse than random.

You have a demand forecast model with lower RMSE than a competitor model. The competitor model has higher asymmetric cost (stockout penalty 5×). Which model should you deploy?

Your model — RMSE is the standard regression metricRMSE treats over- and under-forecast errors symmetrically. When stockout costs are 5× higher than overstock costs, RMSE is not the right objective.
Need to compare on asymmetric cost — your RMSE advantage may not translate to lower business costCorrect. A model with slightly higher RMSE that mostly over-forecasts (cheaper error direction) can be more profitable than a lower-RMSE model that frequently under-forecasts.
Always deploy the model with lower RMSE regardless of cost structureRMSE is a symmetric metric. Business cost structures are often asymmetric. Aligning the evaluation metric to the cost structure matters.
Use the competitor model because it has the higher asymmetric costHigher asymmetric cost means it performs worse on the business metric, not better.

Exercises

Exercise 1 — Threshold sweep for a fraud detector

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

# Fraud dataset: 2% fraud rate
X_f, y_f = make_classification(n_samples=5000, n_features=15, weights=[0.98, 0.02],
                                flip_y=0.03, random_state=7)
X_ftr, X_fte, y_ftr, y_fte = train_test_split(X_f, y_f, test_size=0.3,
                                               random_state=0, stratify=y_f)

fraud_model = make_pipeline(StandardScaler(), LogisticRegression(C=1.0, random_state=0))
fraud_model.fit(X_ftr, y_ftr)
proba_fraud = fraud_model.predict_proba(X_fte)[:, 1]

# TODO:
# Fraud scenario costs:
#   V_TP = $500   (fraud caught: transaction blocked, loss avoided)
#   C_FP = $20    (legit transaction blocked: angry customer, support cost)
#   C_FN = $500   (fraud missed: full transaction loss)
#   V_TN = $0
#
# 1. Scan thresholds from 0.01 to 0.99 and compute EV at each threshold
# 2. Plot EV curve and mark the optimal threshold
# 3. Compare: EV at optimal threshold vs EV at default 0.5
# 4. Report how many FNs and FPs the optimal threshold produces

V_TP_f, C_FP_f, C_FN_f = 500.0, 20.0, 500.0
thresholds_f = np.linspace(0.01, 0.99, 200)

# ev_fraud = []
# for t in thresholds_f:
#     y_pred_t = (proba_fraud >= t).astype(int)
#     tn_t, fp_t, fn_t, tp_t = confusion_matrix(y_fte, y_pred_t, labels=[0,1]).ravel()
#     ev_fraud.append(tp_t * V_TP_f - fp_t * C_FP_f - fn_t * C_FN_f)
# ...

print("Uncomment and complete the threshold sweep above.")

Exercise 2 — Custom scorer in GridSearchCV

Create a custom make_scorer for the churn scenario (V_TP=80, C_FP=10, C_FN=120). Use it to tune the C parameter of a LogisticRegression on the churn dataset from earlier in this notebook. Compare the best C found with business-EV scoring versus AUC scoring.

%matplotlib inline
import numpy as np
from sklearn.metrics import make_scorer, confusion_matrix
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

# X_tr, y_tr, X_te, y_te already defined earlier in this notebook

# TODO:
# def churn_ev_score(y_true, y_pred):
#     tn, fp, fn, tp = confusion_matrix(y_true, y_pred, labels=[0,1]).ravel()
#     return (tp * 80 - fp * 10 - fn * 120) / len(y_true)
#
# churn_scorer = make_scorer(churn_ev_score)
# pipe = make_pipeline(StandardScaler(), LogisticRegression(random_state=0))
# param_grid = {'logisticregression__C': np.logspace(-3, 2, 12)}
#
# gs_ev  = GridSearchCV(pipe, param_grid, cv=5, scoring=churn_scorer, n_jobs=-1)
# gs_auc = GridSearchCV(pipe, param_grid, cv=5, scoring='roc_auc', n_jobs=-1)
# gs_ev.fit(X_tr, y_tr)
# gs_auc.fit(X_tr, y_tr)
#
# print(f"EV-tuned best C:  {gs_ev.best_params_}")
# print(f"AUC-tuned best C: {gs_auc.best_params_}")

print("Uncomment and complete the custom scorer GridSearchCV comparison.")

Exercise 3 — Lift curve for a marketing campaign

Using the churn model’s predicted probabilities (y_proba) and true labels (y_te) already computed earlier:

  1. Compute lift at 5%, 10%, 20%, 30%, and 50% of the population.

  2. If a marketing budget covers contacting 200 customers, and the total test set is 600, what is the optimal strategy?

  3. Calculate the expected number of churners saved at the optimal contact rate vs random.

import numpy as np

# y_proba and y_te already defined earlier in this notebook

# TODO:
# order = np.argsort(y_proba)[::-1]
# y_sorted = y_te[order]
# n_pos = y_te.sum(); n_total = len(y_te)
#
# for pct in [0.05, 0.10, 0.20, 0.30, 0.50]:
#     k = int(pct * n_total)
#     gain = y_sorted[:k].sum() / n_pos
#     lift = gain / pct
#     print(f"Top {pct*100:.0f}%: lift={lift:.2f}x  gain={gain*100:.1f}% of all churners")

print("Uncomment and compute lift at specified percentiles.")

Common Pitfalls

Summary

Key takeaways
ToolWhen to useWhat it answers
Cost matrix + EVAny classification with asymmetric error costsWhat is the total business impact of this model?
Threshold optimisationWhen default 0.5 is not appropriateWhat cutoff maximises net value?
Cumulative gain / lift curveMarketing, targeting, prioritisationHow much more efficient is the model than random?
Custom sklearn scorerHyperparameter tuningTune toward business objective, not AUC
Asymmetric cost regressionDemand forecasting, pricingDoes lower RMSE translate to lower cost?
ROI curveStakeholder presentationsAt what budget does the model generate the most value?

Workflow: define business costs (V_TP, C_FP, C_FN) → scan thresholds for expected value → build lift curve for targeting efficiency → wrap in make_scorer for CV-tuning → present ROI curve to stakeholders.

Next Up — Business Visualisation

You can now measure model value in business terms. Next: communicate it clearly.

The next notebook — Business Visualisation — shows how to build the charts that stakeholders actually read: profit curves, lift decile charts, feature importance bar charts, confusion-matrix heat maps, and threshold-sensitivity dashboards. Effective visualisation is the last mile between a great model and a funded initiative.

Dependencies: predicted probabilities, confusion matrices, lift/gain curves, and the cost parameters introduced in this notebook.