Kaplan–Meier Estimator#

Plotting Loyalty Curves

“The Kaplan–Meier curve: because sometimes you need to visualize how fast your customers ghost you.”


🧠 What Is the Kaplan–Meier Estimator?#

The Kaplan–Meier estimator (a.k.a. the KM curve) helps us estimate the probability that something (or someone 👀) survives beyond a given time.

In business, that “something” is usually:

  • A customer staying subscribed,

  • A product lasting before it breaks, or

  • An employee staying before they update their LinkedIn headline to “Open to Work.”


💡 Core Idea#

Instead of guessing when people will churn, KM helps us say:

“What’s the chance that a customer is still active after X days?”

The KM survival function is calculated step by step:

[ S(t) = \prod_{t_i \leq t} \left(1 - \frac{d_i}{n_i}\right) ]

Where:

  • (t_i): time points where events (churns) happen

  • (d_i): number of churns at time (t_i)

  • (n_i): number of customers still “at risk” just before (t_i)

Translation in business-speak:

Each time a customer churns, the survival probability takes a small hit — like your morale when you check monthly retention numbers.


🧾 Example Time!#

Let’s track 5 customers:

Customer

Time (days)

Event (1=Churned, 0=Active)

A

10

1

B

20

0

C

20

1

D

30

1

E

40

0

Now, step through:

Time

At Risk

Events

Survival Probability

10

5

1

(1 - 1/5) = 0.8

20

4

1

0.8 × (1 - 1/4) = 0.6

30

2

1

0.6 × (1 - 1/2) = 0.3

40

1

0

0.3 × 1 = 0.3

🎯 Interpretation: After 30 days, there’s about a 30% chance a customer is still active. So your product’s half-life is basically one billing cycle.


📊 Plot It Like a Pro#

from lifelines import KaplanMeierFitter
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    "T": [10, 20, 20, 30, 40],
    "E": [1, 0, 1, 1, 0]
})

kmf = KaplanMeierFitter()
kmf.fit(durations=df["T"], event_observed=df["E"], label="Customer Retention")
kmf.plot_survival_function()

plt.title("📉 Kaplan–Meier Curve: How Loyal Are Your Customers?")
plt.xlabel("Days Since Subscription")
plt.ylabel("Probability of Staying Subscribed")
plt.show()

📈 What the Curve Tells You#

  • A steep drop early on → Customers are ghosting faster than your follow-up emails.

  • A flat curve → You’ve found the loyal ones. They’ll probably name their Wi-Fi after you.

  • Censoring marks (⧫) → Customers who haven’t yet churned — still in the game.


🎯 Business Applications#

Use Case

Description

Subscription Retention

Estimate average lifetime of a paying customer.

Product Warranty

Predict how long a product lasts before failure.

Employee Turnover

Visualize “time until resignation” (HR horror story).

Campaign Effectiveness

Compare survival curves of two marketing groups.


🧩 Practice Exercise#

  1. Simulate 100 customers with random churn times.

  2. Use lifelines.KaplanMeierFitter() to plot their survival curve.

  3. Split into Group A (promo emails) and Group B (no emails).

  4. See which group survives longer. (Hint: Don’t bet on the “no emails” group.)


🤹 Fun Thought#

If your KM curve stays above 0.8 after 3 months, you’ve achieved business immortality. 🧙‍♂️

If it hits 0.1 after two weeks, consider changing your pricing… or your product.


🧭 Next Stop#

➡️ Cox Proportional Hazards Model – where we stop pretending all customers are equal and start quantifying who’s most likely to churn next.

# Your code here