# Lab ‚Äì CLV Estimation


> ‚ÄúYou can‚Äôt manage what you can‚Äôt measure ‚Äî especially when your customers are quietly disappearing.‚Äù

Welcome to the **Customer Lifetime Value (CLV) Lab**,
where we turn survival curves, churn probabilities, and discount rates
into **actual business insight** (and maybe a few existential crises about retention).

---

## üéØ Objective

In this lab, you‚Äôll:
- Estimate **customer survival probabilities**
- Predict **expected future transactions**
- Compute **monetary value of future customers**
- And finally, answer the sacred business question:

  > ‚ÄúShould we spend \$20 acquiring this customer‚Ä¶ or just pray they don‚Äôt churn?‚Äù

---

## üß∞ Setup

Let‚Äôs grab the usual suspects.

```python
import pandas as pd
import numpy as np
from lifetimes import BetaGeoFitter, GammaGammaFitter
from lifetimes.datasets import load_cdnow_summary_data_with_monetary_value
import matplotlib.pyplot as plt
import seaborn as sns
````

Load a dataset that‚Äôs basically a **retail time capsule** ‚Äî customer transactions from an ancient CD store (yes, the shiny round things people used before Spotify).

```python
data = load_cdnow_summary_data_with_monetary_value()
data.head()
```

---

## üßÆ Step 1: Fit the BG/NBD Model

The **Beta-Geometric/Negative Binomial Distribution (BG/NBD)** model estimates:

* How often a customer buys,
* How likely they are to come back, and
* When they‚Äôll ghost you for good.

```python
bgf = BetaGeoFitter(penalizer_coef=0.001)
bgf.fit(data['frequency'], data['recency'], data['T'])
```

Plot the predicted purchases for the next 12 months:

```python
t = 12
data['pred_purchases'] = bgf.predict(t, data['frequency'], data['recency'], data['T'])
sns.histplot(data['pred_purchases'], bins=30)
plt.title("Predicted Purchases in Next 12 Months")
plt.show()
```

You‚Äôll likely see a long tail ‚Äî most customers barely buy again,
and a few act like they own the store.

---

## üí∏ Step 2: Estimate Average Order Value with Gamma‚ÄìGamma

Now, we want to know how *much* each customer typically spends when they do purchase.

```python
ggf = GammaGammaFitter(penalizer_coef=0.01)
ggf.fit(data['frequency'], data['monetary_value'])
```

Predict expected average profit per customer:

```python
data['expected_avg_profit'] = ggf.conditional_expected_average_profit(
    data['frequency'], data['monetary_value']
)
```

---

## üß† Step 3: Combine for CLV

Let‚Äôs predict **12 months of CLV** with a 1% monthly discount rate:

```python
data['clv'] = ggf.customer_lifetime_value(
    bgf,
    data['frequency'],
    data['recency'],
    data['T'],
    data['monetary_value'],
    time=12,
    discount_rate=0.01
)
```

Now, sort your customers by CLV like a true capitalist:

```python
data = data.sort_values('clv', ascending=False)
data.head(10)
```

---

## üìà Step 4: Visualize CLV Distribution

Visualize how customer value is distributed (spoiler: it‚Äôs not fair).

```python
sns.histplot(data['clv'], bins=30)
plt.title("Distribution of Predicted CLV")
plt.xlabel("Predicted Customer Lifetime Value ($)")
plt.ylabel("Number of Customers")
plt.show()
```

üé≠ Interpretation:

* Most customers contribute modest value.
* A tiny elite group keeps your company alive.
* These VIPs deserve extra love (and fewer password reset emails).

---

## üí° Step 5: Segment by CLV

Let‚Äôs create customer tiers.

```python
def clv_segment(x):
    if x > 400:
        return "üíé Platinum"
    elif x > 200:
        return "ü•à Silver"
    else:
        return "üßæ Bronze"

data['segment'] = data['clv'].apply(clv_segment)
data['segment'].value_counts()
```

---

## üìä Step 6: Segment Visualization

```python
sns.boxplot(x='segment', y='clv', data=data, order=['üßæ Bronze', 'ü•à Silver', 'üíé Platinum'])
plt.title("Customer Segments by CLV")
plt.show()
```

Now you can see:

* Platinum customers: your heroes
* Silver: the reliable middle class
* Bronze: the ones who use your coupon codes and vanish

---

## üß≠ Step 7: Business Insights

| Metric                   | Interpretation                             |
| ------------------------ | ------------------------------------------ |
| **Avg. CLV**             | Baseline customer worth                    |
| **Top 10% CLV**          | Core customer base (your real ‚Äúcommunity‚Äù) |
| **CLV/Acquisition Cost** | Determines marketing ROI                   |
| **Retention Curve**      | Predicts long-term sustainability          |

---

## üßò Bonus: Survival View

You can also connect this to **survival analysis** from the previous section.

Plot survival probability over time:

```python
from lifelines import KaplanMeierFitter

kmf = KaplanMeierFitter()
kmf.fit(durations=data['T'], event_observed=(data['frequency'] > 0))
kmf.plot_survival_function()
plt.title("Customer Survival Curve")
plt.show()
```

> The slower the curve drops, the longer your customers stick around.
> The faster it drops, the more you should panic (and maybe send a discount email).

---

## üéØ Deliverables

By the end of this lab, you should be able to:

1. Predict customer CLV using BG/NBD and Gamma‚ÄìGamma models
2. Visualize CLV distribution and survival curves
3. Segment customers by predicted value
4. Generate actionable business insights

---

## üí¨ Reflection

Answer these in your notebook or to your inner data monk:

1. What‚Äôs the average CLV across customers?
2. How concentrated is revenue (e.g., top 10% contribution)?
3. How could marketing strategy change with this insight?
4. What happens to CLV if churn improves by 10%?

---

## üß© Wrap-Up

Congratulations!
You‚Äôve just gone from **raw transaction data** ‚Üí **survival analysis** ‚Üí **revenue forecasting**.

And if your CFO now smiles when you say *‚Äúcustomer segmentation,‚Äù*
you‚Äôve done data science right.

---

> ‚ÄúCLV: because treating every customer the same is the fastest way to go broke.‚Äù


In [None]:
# Your code here