“Every customer relationship has a lifespan — some fade away, some churn dramatically, and some stay loyal just to use your Wi-Fi.”
Welcome to Survival Analysis — the mathematical art of predicting how long things last. It’s not just for doctors studying patients or engineers tracking lightbulbs. In business, it helps answer the question: 💡 “How long will this customer stay before they ghost us?”
🎯 What Is Survival Analysis?¶
Survival analysis is used when your target variable is time until an event occurs — like:
🛒 How long until a customer makes their next purchase?
💸 How long until they churn (cancel subscription)?
🧯 How long until a machine fails?
🕵️ How long until the intern finishes their coffee?
The two big components:
Duration (time) — How long something lasted.
Event (binary) — Whether the event happened or not.
1→ event occurred (customer churned)0→ still alive / ongoing (customer still subscribed)
🧩 Key Terms You’ll See Everywhere¶
| Term | Meaning | Example |
|---|---|---|
| T | Time until event | 45 days before churn |
| E | Event indicator | 1 = churned, 0 = still active |
| Censoring | We don’t know what happened yet | Customer still subscribed |
| Survival Function (S(t)) | Probability customer survives beyond time t | 0.7 → 70% chance they’ll still be with us |
| Hazard Function (h(t)) | Instant risk of the event happening at time t | “How risky is this month?” |
🧠 Intuition Check: The “Netflix Relationship” Example¶
Imagine you’re Netflix. You start tracking 100 new users who subscribed in January.
After 1 month, 10 cancel.
After 2 months, another 20 cancel.
The rest are still hanging on for Stranger Things Season 5.
The survival curve will look like a gentle slope downwards — kind of like your motivation after 9 p.m.
🧮 Basic Formulae¶
The survival function is:
[ S(t) = P(T > t) ]
The hazard rate is:
[ h(t) = \frac{f(t)}{S(t)} ]
where:
( f(t) ) = probability density of the event at time t
( S(t) ) = probability of surviving past time t
In simple terms:
The hazard is like the “riskiness per moment,” while survival is the “hope left.”
🧪 Quick Practice Exercise¶
Try this tiny dataset:
| Customer | Time (days) | Event (1=churned) |
|---|---|---|
| A | 30 | 1 |
| B | 60 | 0 |
| C | 45 | 1 |
| D | 80 | 0 |
👉 Calculate:
What’s the proportion of customers surviving past 45 days?
Which customers are right-censored?
(Hint: A and C have churned, B and D are still active.)
📈 Visualization Tip¶
You can use Kaplan–Meier curves to visualize survival probabilities:
from lifelines import KaplanMeierFitter
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
'T': [30, 60, 45, 80],
'E': [1, 0, 1, 0]
})
kmf = KaplanMeierFitter()
kmf.fit(durations=df['T'], event_observed=df['E'])
kmf.plot_survival_function()
plt.title("Customer Survival Curve")
plt.xlabel("Days")
plt.ylabel("Probability of Staying")
plt.show()🎤 Summary¶
| You Now Know | Why It Matters |
|---|---|
| What survival analysis measures | Helps model customer lifetime or churn |
| What censoring is | Deals with incomplete data (ongoing users) |
| Survival vs. hazard functions | Different ways to look at customer “loyalty risk” |
| How to plot a survival curve | Visual storytelling for business presentations |
🧭 Next Stop¶
➡️ Kaplan–Meier Estimator — We’ll learn how to estimate the survival curve properly… and no, it’s not a life insurance form.
🎲 Optional Challenge¶
Load your company’s subscription data and:
Identify which customers are censored.
Plot their survival curve.
Caption it humorously, e.g., “Loyalty drops faster than coffee levels on Monday mornings.”
# Your code here