Survival Analysis Basics#

“Every customer relationship has a lifespan — some fade away, some churn dramatically, and some stay loyal just to use your Wi-Fi.”

Welcome to Survival Analysis — the mathematical art of predicting how long things last. It’s not just for doctors studying patients or engineers tracking lightbulbs. In business, it helps answer the question: 💡 “How long will this customer stay before they ghost us?”


🎯 What Is Survival Analysis?#

Survival analysis is used when your target variable is time until an event occurs — like:

  • 🛒 How long until a customer makes their next purchase?

  • 💸 How long until they churn (cancel subscription)?

  • 🧯 How long until a machine fails?

  • 🕵️ How long until the intern finishes their coffee?

The two big components:

  1. Duration (time) — How long something lasted.

  2. Event (binary) — Whether the event happened or not.

    • 1 → event occurred (customer churned)

    • 0 → still alive / ongoing (customer still subscribed)


🧩 Key Terms You’ll See Everywhere#

Term

Meaning

Example

T

Time until event

45 days before churn

E

Event indicator

1 = churned, 0 = still active

Censoring

We don’t know what happened yet

Customer still subscribed

Survival Function (S(t))

Probability customer survives beyond time t

0.7 → 70% chance they’ll still be with us

Hazard Function (h(t))

Instant risk of the event happening at time t

“How risky is this month?”


🧠 Intuition Check: The “Netflix Relationship” Example#

Imagine you’re Netflix. You start tracking 100 new users who subscribed in January.

  • After 1 month, 10 cancel.

  • After 2 months, another 20 cancel.

  • The rest are still hanging on for Stranger Things Season 5.

The survival curve will look like a gentle slope downwards — kind of like your motivation after 9 p.m.


🧮 Basic Formulae#

The survival function is:

[ S(t) = P(T > t) ]

The hazard rate is:

[ h(t) = \frac{f(t)}{S(t)} ]

where:

  • ( f(t) ) = probability density of the event at time t

  • ( S(t) ) = probability of surviving past time t

In simple terms:

The hazard is like the “riskiness per moment,” while survival is the “hope left.”


🧪 Quick Practice Exercise#

Try this tiny dataset:

Customer

Time (days)

Event (1=churned)

A

30

1

B

60

0

C

45

1

D

80

0

👉 Calculate:

  1. What’s the proportion of customers surviving past 45 days?

  2. Which customers are right-censored?

(Hint: A and C have churned, B and D are still active.)


📈 Visualization Tip#

You can use Kaplan–Meier curves to visualize survival probabilities:

from lifelines import KaplanMeierFitter
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'T': [30, 60, 45, 80],
    'E': [1, 0, 1, 0]
})

kmf = KaplanMeierFitter()
kmf.fit(durations=df['T'], event_observed=df['E'])
kmf.plot_survival_function()
plt.title("Customer Survival Curve")
plt.xlabel("Days")
plt.ylabel("Probability of Staying")
plt.show()

🎤 Summary#

You Now Know

Why It Matters

What survival analysis measures

Helps model customer lifetime or churn

What censoring is

Deals with incomplete data (ongoing users)

Survival vs. hazard functions

Different ways to look at customer “loyalty risk”

How to plot a survival curve

Visual storytelling for business presentations


🧭 Next Stop#

➡️ Kaplan–Meier Estimator — We’ll learn how to estimate the survival curve properly… and no, it’s not a life insurance form.


🎲 Optional Challenge#

Load your company’s subscription data and:

  1. Identify which customers are censored.

  2. Plot their survival curve.

  3. Caption it humorously, e.g., “Loyalty drops faster than coffee levels on Monday mornings.”

# Your code here