Lab – CLV Estimation#

“You can’t manage what you can’t measure — especially when your customers are quietly disappearing.”

Welcome to the Customer Lifetime Value (CLV) Lab, where we turn survival curves, churn probabilities, and discount rates into actual business insight (and maybe a few existential crises about retention).


🎯 Objective#

In this lab, you’ll:

  • Estimate customer survival probabilities

  • Predict expected future transactions

  • Compute monetary value of future customers

  • And finally, answer the sacred business question:

    “Should we spend $20 acquiring this customer… or just pray they don’t churn?”


🧰 Setup#

Let’s grab the usual suspects.

import pandas as pd
import numpy as np
from lifetimes import BetaGeoFitter, GammaGammaFitter
from lifetimes.datasets import load_cdnow_summary_data_with_monetary_value
import matplotlib.pyplot as plt
import seaborn as sns

Load a dataset that’s basically a retail time capsule — customer transactions from an ancient CD store (yes, the shiny round things people used before Spotify).

data = load_cdnow_summary_data_with_monetary_value()
data.head()

🧮 Step 1: Fit the BG/NBD Model#

The Beta-Geometric/Negative Binomial Distribution (BG/NBD) model estimates:

  • How often a customer buys,

  • How likely they are to come back, and

  • When they’ll ghost you for good.

bgf = BetaGeoFitter(penalizer_coef=0.001)
bgf.fit(data['frequency'], data['recency'], data['T'])

Plot the predicted purchases for the next 12 months:

t = 12
data['pred_purchases'] = bgf.predict(t, data['frequency'], data['recency'], data['T'])
sns.histplot(data['pred_purchases'], bins=30)
plt.title("Predicted Purchases in Next 12 Months")
plt.show()

You’ll likely see a long tail — most customers barely buy again, and a few act like they own the store.


💸 Step 2: Estimate Average Order Value with Gamma–Gamma#

Now, we want to know how much each customer typically spends when they do purchase.

ggf = GammaGammaFitter(penalizer_coef=0.01)
ggf.fit(data['frequency'], data['monetary_value'])

Predict expected average profit per customer:

data['expected_avg_profit'] = ggf.conditional_expected_average_profit(
    data['frequency'], data['monetary_value']
)

🧠 Step 3: Combine for CLV#

Let’s predict 12 months of CLV with a 1% monthly discount rate:

data['clv'] = ggf.customer_lifetime_value(
    bgf,
    data['frequency'],
    data['recency'],
    data['T'],
    data['monetary_value'],
    time=12,
    discount_rate=0.01
)

Now, sort your customers by CLV like a true capitalist:

data = data.sort_values('clv', ascending=False)
data.head(10)

📈 Step 4: Visualize CLV Distribution#

Visualize how customer value is distributed (spoiler: it’s not fair).

sns.histplot(data['clv'], bins=30)
plt.title("Distribution of Predicted CLV")
plt.xlabel("Predicted Customer Lifetime Value ($)")
plt.ylabel("Number of Customers")
plt.show()

🎭 Interpretation:

  • Most customers contribute modest value.

  • A tiny elite group keeps your company alive.

  • These VIPs deserve extra love (and fewer password reset emails).


💡 Step 5: Segment by CLV#

Let’s create customer tiers.

def clv_segment(x):
    if x > 400:
        return "💎 Platinum"
    elif x > 200:
        return "🥈 Silver"
    else:
        return "🧾 Bronze"

data['segment'] = data['clv'].apply(clv_segment)
data['segment'].value_counts()

📊 Step 6: Segment Visualization#

sns.boxplot(x='segment', y='clv', data=data, order=['🧾 Bronze', '🥈 Silver', '💎 Platinum'])
plt.title("Customer Segments by CLV")
plt.show()

Now you can see:

  • Platinum customers: your heroes

  • Silver: the reliable middle class

  • Bronze: the ones who use your coupon codes and vanish


🧭 Step 7: Business Insights#

Metric

Interpretation

Avg. CLV

Baseline customer worth

Top 10% CLV

Core customer base (your real “community”)

CLV/Acquisition Cost

Determines marketing ROI

Retention Curve

Predicts long-term sustainability


🧘 Bonus: Survival View#

You can also connect this to survival analysis from the previous section.

Plot survival probability over time:

from lifelines import KaplanMeierFitter

kmf = KaplanMeierFitter()
kmf.fit(durations=data['T'], event_observed=(data['frequency'] > 0))
kmf.plot_survival_function()
plt.title("Customer Survival Curve")
plt.show()

The slower the curve drops, the longer your customers stick around. The faster it drops, the more you should panic (and maybe send a discount email).


🎯 Deliverables#

By the end of this lab, you should be able to:

  1. Predict customer CLV using BG/NBD and Gamma–Gamma models

  2. Visualize CLV distribution and survival curves

  3. Segment customers by predicted value

  4. Generate actionable business insights


💬 Reflection#

Answer these in your notebook or to your inner data monk:

  1. What’s the average CLV across customers?

  2. How concentrated is revenue (e.g., top 10% contribution)?

  3. How could marketing strategy change with this insight?

  4. What happens to CLV if churn improves by 10%?


🧩 Wrap-Up#

Congratulations! You’ve just gone from raw transaction datasurvival analysisrevenue forecasting.

And if your CFO now smiles when you say “customer segmentation,” you’ve done data science right.


“CLV: because treating every customer the same is the fastest way to go broke.”

# Your code here