Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Because correlation is what interns discover, but causation is what executives pay bonuses for.


💡 Why Causality Matters

Let’s start with the sacred business truth:

“Sales went up after the marketing campaign — therefore, the campaign worked!”

No, Karen. Maybe it was Christmas. 🎄

Causal inference is about knowing what actually caused something to happen — not just what happened around the same time as your KPI moved.


🧠 Causality vs Correlation

ConceptDescriptionBusiness Example
CorrelationTwo things move togetherCoffee sales rise when it rains ☕🌧️
CausationOne thing makes the other happenDiscounts cause more sales (probably)
Spurious correlationA third factor fools youIce cream sales and shark attacks both rise in summer 🦈🍦

If you build a model on correlation alone, you might end up recommending to “launch shark-safe ice cream ads” — which is, admittedly, very on-brand for modern marketing.


🧪 The A/B Test – Business Edition

A/B testing is the most popular way to establish causality — or at least pretend to.

A/B Test = Controlled chaos with statistical backing.

Example:

  • Group A: sees the old website

  • Group B: sees the new “AI-enhanced” website that nobody understands

If B converts better → great, causality achieved. If not → you just proved your designer wrong (scientifically).


Basic Code Example (with Statsmodels)

import numpy as np
import statsmodels.api as sm

# Fake conversion data
A = np.random.binomial(1, 0.10, 1000)  # 10% conversion rate
B = np.random.binomial(1, 0.12, 1000)  # 12% conversion rate

# Run a t-test
t_stat, p_value = sm.stats.ttest_ind(A, B)
print(f"T-statistic: {t_stat:.3f}, p-value: {p_value:.3f}")

if p_value < 0.05:
    print("✅ Statistically significant! You can brag in the next meeting.")
else:
    print("❌ Probably random. Don’t email the CEO yet.")

🧩 Confounders: The Hidden Villains

A confounder is something that affects both your independent and dependent variables. They sneak in like spies and ruin your experiment. 🕵️‍♀️

Example:

You find that “People who buy organic food also buy more yoga mats.”

Confounder: Income.

Rich people can afford both quinoa and flexibility.

To fix this, we use matching, stratification, or causal graphs (DAGs) to isolate the real relationship.


📈 DAGs – Drawing the Blame Network

A Directed Acyclic Graph (DAG) is basically a corporate blame chart:

  • Arrows = “This thing affects that thing.”

  • Goal = Find out who’s really responsible for the KPI going up or down.

Marketing Spend → Sales
     ↑
     └── Seasonality

Moral: Sometimes, it’s not your campaign. It’s summer vacation.


🔍 Regression for Causal Estimation

Regression can estimate causal effects — but only if you’re careful.

import statsmodels.formula.api as smf
import pandas as pd

df = pd.DataFrame({
    'ad_spend': np.random.rand(100)*1000,
    'sales': np.random.rand(100)*10000,
    'season': np.random.choice(['summer','winter'], 100)
})

model = smf.ols('sales ~ ad_spend + C(season)', data=df).fit()
print(model.summary())

Here, C(season) helps control for the confounder — so we’re not blaming your marketing budget for what Santa Claus did. 🎅


🧮 Causal Inference Frameworks

MethodDescriptionUse Case
A/B TestingRandomized controlled experimentWebsite design, pricing
Difference-in-Differences (DiD)Compares changes before/after treatmentPolicy, region-based campaigns
Instrumental Variables (IV)Uses an external “randomizer” variableAd exposure, market shocks
Propensity Score MatchingMatches treated vs. control with similar featuresCustomer-level analysis
Causal Forests / DoWhy / EconMLMachine learning for causal inferenceWhen you want causality and flexibility

🏢 Business Example: Email Campaign Impact

A retailer sends promotional emails to half their customers. After 2 weeks:

  • Group A (email): +15% sales increase

  • Group B (no email): +12%

The intern says:

“Emails work! +3% lift!”

But… customers who got emails also had higher previous spending.

After controlling for customer value, the real lift? Barely +0.5%.

Moral: Data never lies — but analysts often forget context. 😅


🧪 Mini Exercise

Try designing a simple causal test:

  1. Pick a recent business change (e.g., new pricing, feature launch).

  2. Split users randomly.

  3. Measure a KPI (conversion, retention).

  4. Use a t-test or regression to estimate the lift.

  5. Report your findings with confidence intervals — and jokes.


💬 TL;DR

  • Correlation ≠ Causation (unless you’re writing a bad investor deck).

  • Randomization is your best friend.

  • Always watch for confounders — they’re everywhere.

  • A/B testing is simple but powerful.

  • Use causal ML if you want to sound fancy and get that “AI Strategy” budget. 💰

# Your code here