Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

“Every customer relationship has a lifespan — some fade away, some churn dramatically, and some stay loyal just to use your Wi-Fi.”

Welcome to Survival Analysis — the mathematical art of predicting how long things last. It’s not just for doctors studying patients or engineers tracking lightbulbs. In business, it helps answer the question: 💡 “How long will this customer stay before they ghost us?”


🎯 What Is Survival Analysis?

Survival analysis is used when your target variable is time until an event occurs — like:

  • 🛒 How long until a customer makes their next purchase?

  • 💸 How long until they churn (cancel subscription)?

  • 🧯 How long until a machine fails?

  • 🕵️ How long until the intern finishes their coffee?

The two big components:

  1. Duration (time) — How long something lasted.

  2. Event (binary) — Whether the event happened or not.

    • 1 → event occurred (customer churned)

    • 0 → still alive / ongoing (customer still subscribed)


🧩 Key Terms You’ll See Everywhere

TermMeaningExample
TTime until event45 days before churn
EEvent indicator1 = churned, 0 = still active
CensoringWe don’t know what happened yetCustomer still subscribed
Survival Function (S(t))Probability customer survives beyond time t0.7 → 70% chance they’ll still be with us
Hazard Function (h(t))Instant risk of the event happening at time t“How risky is this month?”

🧠 Intuition Check: The “Netflix Relationship” Example

Imagine you’re Netflix. You start tracking 100 new users who subscribed in January.

  • After 1 month, 10 cancel.

  • After 2 months, another 20 cancel.

  • The rest are still hanging on for Stranger Things Season 5.

The survival curve will look like a gentle slope downwards — kind of like your motivation after 9 p.m.


🧮 Basic Formulae

The survival function is:

[ S(t) = P(T > t) ]

The hazard rate is:

[ h(t) = \frac{f(t)}{S(t)} ]

where:

  • ( f(t) ) = probability density of the event at time t

  • ( S(t) ) = probability of surviving past time t

In simple terms:

The hazard is like the “riskiness per moment,” while survival is the “hope left.”


🧪 Quick Practice Exercise

Try this tiny dataset:

CustomerTime (days)Event (1=churned)
A301
B600
C451
D800

👉 Calculate:

  1. What’s the proportion of customers surviving past 45 days?

  2. Which customers are right-censored?

(Hint: A and C have churned, B and D are still active.)


📈 Visualization Tip

You can use Kaplan–Meier curves to visualize survival probabilities:

from lifelines import KaplanMeierFitter
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'T': [30, 60, 45, 80],
    'E': [1, 0, 1, 0]
})

kmf = KaplanMeierFitter()
kmf.fit(durations=df['T'], event_observed=df['E'])
kmf.plot_survival_function()
plt.title("Customer Survival Curve")
plt.xlabel("Days")
plt.ylabel("Probability of Staying")
plt.show()

🎤 Summary

You Now KnowWhy It Matters
What survival analysis measuresHelps model customer lifetime or churn
What censoring isDeals with incomplete data (ongoing users)
Survival vs. hazard functionsDifferent ways to look at customer “loyalty risk”
How to plot a survival curveVisual storytelling for business presentations

🧭 Next Stop

➡️ Kaplan–Meier Estimator — We’ll learn how to estimate the survival curve properly… and no, it’s not a life insurance form.


🎲 Optional Challenge

Load your company’s subscription data and:

  1. Identify which customers are censored.

  2. Plot their survival curve.

  3. Caption it humorously, e.g., “Loyalty drops faster than coffee levels on Monday mornings.”

# Your code here