Because correlation is what interns discover, but causation is what executives pay bonuses for.
💡 Why Causality Matters¶
Let’s start with the sacred business truth:
“Sales went up after the marketing campaign — therefore, the campaign worked!”
No, Karen. Maybe it was Christmas. 🎄
Causal inference is about knowing what actually caused something to happen — not just what happened around the same time as your KPI moved.
🧠 Causality vs Correlation¶
| Concept | Description | Business Example |
|---|---|---|
| Correlation | Two things move together | Coffee sales rise when it rains ☕🌧️ |
| Causation | One thing makes the other happen | Discounts cause more sales (probably) |
| Spurious correlation | A third factor fools you | Ice cream sales and shark attacks both rise in summer 🦈🍦 |
If you build a model on correlation alone, you might end up recommending to “launch shark-safe ice cream ads” — which is, admittedly, very on-brand for modern marketing.
🧪 The A/B Test – Business Edition¶
A/B testing is the most popular way to establish causality — or at least pretend to.
A/B Test = Controlled chaos with statistical backing.
Example:¶
Group A: sees the old website
Group B: sees the new “AI-enhanced” website that nobody understands
If B converts better → great, causality achieved. If not → you just proved your designer wrong (scientifically).
Basic Code Example (with Statsmodels)¶
import numpy as np
import statsmodels.api as sm
# Fake conversion data
A = np.random.binomial(1, 0.10, 1000) # 10% conversion rate
B = np.random.binomial(1, 0.12, 1000) # 12% conversion rate
# Run a t-test
t_stat, p_value = sm.stats.ttest_ind(A, B)
print(f"T-statistic: {t_stat:.3f}, p-value: {p_value:.3f}")
if p_value < 0.05:
print("✅ Statistically significant! You can brag in the next meeting.")
else:
print("❌ Probably random. Don’t email the CEO yet.")🧩 Confounders: The Hidden Villains¶
A confounder is something that affects both your independent and dependent variables. They sneak in like spies and ruin your experiment. 🕵️♀️
Example:
You find that “People who buy organic food also buy more yoga mats.”
Confounder: Income.
Rich people can afford both quinoa and flexibility.
To fix this, we use matching, stratification, or causal graphs (DAGs) to isolate the real relationship.
📈 DAGs – Drawing the Blame Network¶
A Directed Acyclic Graph (DAG) is basically a corporate blame chart:
Arrows = “This thing affects that thing.”
Goal = Find out who’s really responsible for the KPI going up or down.
Marketing Spend → Sales
↑
└── SeasonalityMoral: Sometimes, it’s not your campaign. It’s summer vacation.
🔍 Regression for Causal Estimation¶
Regression can estimate causal effects — but only if you’re careful.
import statsmodels.formula.api as smf
import pandas as pd
df = pd.DataFrame({
'ad_spend': np.random.rand(100)*1000,
'sales': np.random.rand(100)*10000,
'season': np.random.choice(['summer','winter'], 100)
})
model = smf.ols('sales ~ ad_spend + C(season)', data=df).fit()
print(model.summary())Here, C(season) helps control for the confounder —
so we’re not blaming your marketing budget for what Santa Claus did. 🎅
🧮 Causal Inference Frameworks¶
| Method | Description | Use Case |
|---|---|---|
| A/B Testing | Randomized controlled experiment | Website design, pricing |
| Difference-in-Differences (DiD) | Compares changes before/after treatment | Policy, region-based campaigns |
| Instrumental Variables (IV) | Uses an external “randomizer” variable | Ad exposure, market shocks |
| Propensity Score Matching | Matches treated vs. control with similar features | Customer-level analysis |
| Causal Forests / DoWhy / EconML | Machine learning for causal inference | When you want causality and flexibility |
🏢 Business Example: Email Campaign Impact¶
A retailer sends promotional emails to half their customers. After 2 weeks:
Group A (email): +15% sales increase
Group B (no email): +12%
The intern says:
“Emails work! +3% lift!”
But… customers who got emails also had higher previous spending.
After controlling for customer value, the real lift? Barely +0.5%.
Moral: Data never lies — but analysts often forget context. 😅
🧪 Mini Exercise¶
Try designing a simple causal test:
Pick a recent business change (e.g., new pricing, feature launch).
Split users randomly.
Measure a KPI (conversion, retention).
Use a t-test or regression to estimate the lift.
Report your findings with confidence intervals — and jokes.
💬 TL;DR¶
Correlation ≠ Causation (unless you’re writing a bad investor deck).
Randomization is your best friend.
Always watch for confounders — they’re everywhere.
A/B testing is simple but powerful.
Use causal ML if you want to sound fancy and get that “AI Strategy” budget. 💰
# Your code here