Survival Analysis Basics#
“Every customer relationship has a lifespan — some fade away, some churn dramatically, and some stay loyal just to use your Wi-Fi.”
Welcome to Survival Analysis — the mathematical art of predicting how long things last. It’s not just for doctors studying patients or engineers tracking lightbulbs. In business, it helps answer the question: 💡 “How long will this customer stay before they ghost us?”
🎯 What Is Survival Analysis?#
Survival analysis is used when your target variable is time until an event occurs — like:
🛒 How long until a customer makes their next purchase?
💸 How long until they churn (cancel subscription)?
🧯 How long until a machine fails?
🕵️ How long until the intern finishes their coffee?
The two big components:
Duration (time) — How long something lasted.
Event (binary) — Whether the event happened or not.
1→ event occurred (customer churned)0→ still alive / ongoing (customer still subscribed)
🧩 Key Terms You’ll See Everywhere#
Term |
Meaning |
Example |
|---|---|---|
T |
Time until event |
45 days before churn |
E |
Event indicator |
1 = churned, 0 = still active |
Censoring |
We don’t know what happened yet |
Customer still subscribed |
Survival Function (S(t)) |
Probability customer survives beyond time t |
0.7 → 70% chance they’ll still be with us |
Hazard Function (h(t)) |
Instant risk of the event happening at time t |
“How risky is this month?” |
🧠 Intuition Check: The “Netflix Relationship” Example#
Imagine you’re Netflix. You start tracking 100 new users who subscribed in January.
After 1 month, 10 cancel.
After 2 months, another 20 cancel.
The rest are still hanging on for Stranger Things Season 5.
The survival curve will look like a gentle slope downwards — kind of like your motivation after 9 p.m.
🧮 Basic Formulae#
The survival function is:
[ S(t) = P(T > t) ]
The hazard rate is:
[ h(t) = \frac{f(t)}{S(t)} ]
where:
( f(t) ) = probability density of the event at time t
( S(t) ) = probability of surviving past time t
In simple terms:
The hazard is like the “riskiness per moment,” while survival is the “hope left.”
🧪 Quick Practice Exercise#
Try this tiny dataset:
Customer |
Time (days) |
Event (1=churned) |
|---|---|---|
A |
30 |
1 |
B |
60 |
0 |
C |
45 |
1 |
D |
80 |
0 |
👉 Calculate:
What’s the proportion of customers surviving past 45 days?
Which customers are right-censored?
(Hint: A and C have churned, B and D are still active.)
📈 Visualization Tip#
You can use Kaplan–Meier curves to visualize survival probabilities:
from lifelines import KaplanMeierFitter
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
'T': [30, 60, 45, 80],
'E': [1, 0, 1, 0]
})
kmf = KaplanMeierFitter()
kmf.fit(durations=df['T'], event_observed=df['E'])
kmf.plot_survival_function()
plt.title("Customer Survival Curve")
plt.xlabel("Days")
plt.ylabel("Probability of Staying")
plt.show()
🎤 Summary#
You Now Know |
Why It Matters |
|---|---|
What survival analysis measures |
Helps model customer lifetime or churn |
What censoring is |
Deals with incomplete data (ongoing users) |
Survival vs. hazard functions |
Different ways to look at customer “loyalty risk” |
How to plot a survival curve |
Visual storytelling for business presentations |
🧭 Next Stop#
➡️ Kaplan–Meier Estimator — We’ll learn how to estimate the survival curve properly… and no, it’s not a life insurance form.
🎲 Optional Challenge#
Load your company’s subscription data and:
Identify which customers are censored.
Plot their survival curve.
Caption it humorously, e.g., “Loyalty drops faster than coffee levels on Monday mornings.”
# Your code here