Cox Proportional Hazards#

“Not all customers are equally risky. Some churn quietly… others vanish the moment you send a survey.”

Welcome to the Cox Proportional Hazards model, a.k.a. the Sherlock Holmes of survival analysis — it investigates who’s more likely to churn and why, without needing to know exactly when.


🎯 The Core Idea#

While the Kaplan–Meier curve tells you how long people survive, the Cox model tells you which features make them survive longer (or shorter).

In other words:

KM = “How loyal is everyone?” Cox = “Who’s the least loyal and why?”


🧮 The Magic Formula (Simplified)#

[ h(t|x) = h_0(t) \times e^{(\beta_1 x_1 + \beta_2 x_2 + … + \beta_n x_n)} ]

Where:

  • (h(t|x)): hazard (risk of event) for a customer with features (x)

  • (h_0(t)): baseline hazard (the average risk for everyone)

  • (\beta_i): coefficients showing how risky each feature is

Basically:

The higher the hazard, the faster someone churns. A negative β means loyalty. A positive β means “brace for cancellation.”


💼 Example: Subscription Business Case#

Feature

Coefficient (β)

Business Translation

Tenure

-0.05

Loyal customers churn less

Monthly Spend

0.02

Expensive plan → higher risk

Customer Support Calls

0.1

The more they call, the closer they are to goodbye

Discount Received

-0.15

Discounts = Love (temporarily)

So if a customer:

  • Has been around long enough,

  • Pays less, and

  • Rarely calls support — their hazard is basically “snoozing on autopay.”


🧪 Quick Python Example#

from lifelines import CoxPHFitter
import pandas as pd

# Fake churn dataset
df = pd.DataFrame({
    'tenure': [10, 20, 5, 8, 15],
    'spend': [50, 80, 30, 40, 60],
    'calls': [1, 5, 2, 3, 1],
    'event': [1, 1, 0, 1, 0],
    'time': [30, 45, 60, 25, 80]
})

cox = CoxPHFitter()
cox.fit(df, duration_col='time', event_col='event')
cox.print_summary()

Interpret like a pro:

  • exp(coef) > 1: increases risk of churn (⚠️ red flag)

  • exp(coef) < 1: decreases risk (💚 loyal legend)


📊 Visualizing the Risk#

cox.plot_partial_effects_on_outcome(covariates='spend', values=[30, 50, 80])

💡 The steeper the line → the faster churn happens as the variable increases. (Translation: “Stop charging your best customers more than your competitors.”)


🧩 Common Business Use Cases#

Use Case

Description

Churn modeling

Which customer traits predict early exit

Employee attrition

Who’s most likely to quit next quarter

Loan default risk

Which borrowers will ghost the bank

Warranty claims

What product traits predict early failure


🤓 Important Assumption: “Proportional Hazards”#

The Cox model assumes the ratio of hazards between customers is constant over time. If it’s not, your model starts lying politely — like a sales forecast in December.

To check:

cox.check_assumptions(df, p_value_threshold=0.05)

If the check fails, you might need to:

  • Add time-dependent covariates

  • Split your model by time groups

  • Or… pretend it passed and pray nobody asks


🧠 Practice Task#

Try it out:

  1. Load your company churn data.

  2. Fit a CoxPHFitter.

  3. Identify the top 3 “hazardous” features.

  4. Present it to management as: “Here’s who’s most likely to leave us — and here’s how to stop them.”


🎭 TL;DR Summary#

Concept

Meaning

Hazard

Risk of churn at any given moment

β Coefficients

Direction & strength of each factor

exp(β)

Change in risk (hazard ratio)

Proportional Hazards

Assumes risks change proportionally across time


🧭 Next Stop#

➡️ Customer Lifetime Value Modeling (CLV) — where we flip the script from “Who’s leaving?” to “How much are they worth before they do?”

# Your code here