Kaplan–Meier Estimator#
Plotting Loyalty Curves
“The Kaplan–Meier curve: because sometimes you need to visualize how fast your customers ghost you.”
🧠 What Is the Kaplan–Meier Estimator?#
The Kaplan–Meier estimator (a.k.a. the KM curve) helps us estimate the probability that something (or someone 👀) survives beyond a given time.
In business, that “something” is usually:
A customer staying subscribed,
A product lasting before it breaks, or
An employee staying before they update their LinkedIn headline to “Open to Work.”
💡 Core Idea#
Instead of guessing when people will churn, KM helps us say:
“What’s the chance that a customer is still active after X days?”
The KM survival function is calculated step by step:
[ S(t) = \prod_{t_i \leq t} \left(1 - \frac{d_i}{n_i}\right) ]
Where:
(t_i): time points where events (churns) happen
(d_i): number of churns at time (t_i)
(n_i): number of customers still “at risk” just before (t_i)
Translation in business-speak:
Each time a customer churns, the survival probability takes a small hit — like your morale when you check monthly retention numbers.
🧾 Example Time!#
Let’s track 5 customers:
Customer |
Time (days) |
Event (1=Churned, 0=Active) |
|---|---|---|
A |
10 |
1 |
B |
20 |
0 |
C |
20 |
1 |
D |
30 |
1 |
E |
40 |
0 |
Now, step through:
Time |
At Risk |
Events |
Survival Probability |
|---|---|---|---|
10 |
5 |
1 |
(1 - 1/5) = 0.8 |
20 |
4 |
1 |
0.8 × (1 - 1/4) = 0.6 |
30 |
2 |
1 |
0.6 × (1 - 1/2) = 0.3 |
40 |
1 |
0 |
0.3 × 1 = 0.3 |
🎯 Interpretation: After 30 days, there’s about a 30% chance a customer is still active. So your product’s half-life is basically one billing cycle.
📊 Plot It Like a Pro#
from lifelines import KaplanMeierFitter
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
"T": [10, 20, 20, 30, 40],
"E": [1, 0, 1, 1, 0]
})
kmf = KaplanMeierFitter()
kmf.fit(durations=df["T"], event_observed=df["E"], label="Customer Retention")
kmf.plot_survival_function()
plt.title("📉 Kaplan–Meier Curve: How Loyal Are Your Customers?")
plt.xlabel("Days Since Subscription")
plt.ylabel("Probability of Staying Subscribed")
plt.show()
📈 What the Curve Tells You#
A steep drop early on → Customers are ghosting faster than your follow-up emails.
A flat curve → You’ve found the loyal ones. They’ll probably name their Wi-Fi after you.
Censoring marks (⧫) → Customers who haven’t yet churned — still in the game.
🎯 Business Applications#
Use Case |
Description |
|---|---|
Subscription Retention |
Estimate average lifetime of a paying customer. |
Product Warranty |
Predict how long a product lasts before failure. |
Employee Turnover |
Visualize “time until resignation” (HR horror story). |
Campaign Effectiveness |
Compare survival curves of two marketing groups. |
🧩 Practice Exercise#
Simulate 100 customers with random churn times.
Use
lifelines.KaplanMeierFitter()to plot their survival curve.Split into Group A (promo emails) and Group B (no emails).
See which group survives longer. (Hint: Don’t bet on the “no emails” group.)
🤹 Fun Thought#
If your KM curve stays above 0.8 after 3 months, you’ve achieved business immortality. 🧙♂️
If it hits 0.1 after two weeks, consider changing your pricing… or your product.
🧭 Next Stop#
➡️ Cox Proportional Hazards Model – where we stop pretending all customers are equal and start quantifying who’s most likely to churn next.
# Your code here