Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

“Not all customers are equally risky. Some churn quietly… others vanish the moment you send a survey.”

Welcome to the Cox Proportional Hazards model, a.k.a. the Sherlock Holmes of survival analysis — it investigates who’s more likely to churn and why, without needing to know exactly when.


🎯 The Core Idea

While the Kaplan–Meier curve tells you how long people survive, the Cox model tells you which features make them survive longer (or shorter).

In other words:

KM = “How loyal is everyone?” Cox = “Who’s the least loyal and why?”


🧮 The Magic Formula (Simplified)

[ h(t|x) = h_0(t) \times e^{(\beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n)} ]

Where:

  • (h(t|x)): hazard (risk of event) for a customer with features (x)

  • (h_0(t)): baseline hazard (the average risk for everyone)

  • (\beta_i): coefficients showing how risky each feature is

Basically:

The higher the hazard, the faster someone churns. A negative β means loyalty. A positive β means “brace for cancellation.”


💼 Example: Subscription Business Case

FeatureCoefficient (β)Business Translation
Tenure-0.05Loyal customers churn less
Monthly Spend0.02Expensive plan → higher risk
Customer Support Calls0.1The more they call, the closer they are to goodbye
Discount Received-0.15Discounts = Love (temporarily)

So if a customer:

  • Has been around long enough,

  • Pays less, and

  • Rarely calls support — their hazard is basically “snoozing on autopay.”


🧪 Quick Python Example

from lifelines import CoxPHFitter
import pandas as pd

# Fake churn dataset
df = pd.DataFrame({
    'tenure': [10, 20, 5, 8, 15],
    'spend': [50, 80, 30, 40, 60],
    'calls': [1, 5, 2, 3, 1],
    'event': [1, 1, 0, 1, 0],
    'time': [30, 45, 60, 25, 80]
})

cox = CoxPHFitter()
cox.fit(df, duration_col='time', event_col='event')
cox.print_summary()

Interpret like a pro:

  • exp(coef) > 1: increases risk of churn (⚠️ red flag)

  • exp(coef) < 1: decreases risk (💚 loyal legend)


📊 Visualizing the Risk

cox.plot_partial_effects_on_outcome(covariates='spend', values=[30, 50, 80])

💡 The steeper the line → the faster churn happens as the variable increases. (Translation: “Stop charging your best customers more than your competitors.”)


🧩 Common Business Use Cases

Use CaseDescription
Churn modelingWhich customer traits predict early exit
Employee attritionWho’s most likely to quit next quarter
Loan default riskWhich borrowers will ghost the bank
Warranty claimsWhat product traits predict early failure

🤓 Important Assumption: “Proportional Hazards”

The Cox model assumes the ratio of hazards between customers is constant over time. If it’s not, your model starts lying politely — like a sales forecast in December.

To check:

cox.check_assumptions(df, p_value_threshold=0.05)

If the check fails, you might need to:

  • Add time-dependent covariates

  • Split your model by time groups

  • Or… pretend it passed and pray nobody asks


🧠 Practice Task

Try it out:

  1. Load your company churn data.

  2. Fit a CoxPHFitter.

  3. Identify the top 3 “hazardous” features.

  4. Present it to management as: “Here’s who’s most likely to leave us — and here’s how to stop them.”


🎭 TL;DR Summary

ConceptMeaning
HazardRisk of churn at any given moment
β CoefficientsDirection & strength of each factor
exp(β)Change in risk (hazard ratio)
Proportional HazardsAssumes risks change proportionally across time

🧭 Next Stop

➡️ Customer Lifetime Value Modeling (CLV) — where we flip the script from “Who’s leaving?” to “How much are they worth before they do?”

# Your code here