“Because Sometimes the ‘Better’ Version Isn’t Actually Better”¶
💬 “We changed the button color to green and sales went up 2%! We’re geniuses!”
Every product manager before realizing it was just Tuesday payday traffic
🧪 What Is A/B Testing?¶
A/B testing is basically the scientific method for marketers — you take two (or more) versions of something, show them to different groups, and see which one actually performs better.
It’s like comparing:
A: “Old boring landing page”
B: “New fancy landing page with dancing llama GIFs” 🦙✨
Then you measure which gets more clicks, conversions, or complaints.
🎨 Why You Should Care About KPI Alignment¶
Because if your A/B test improves clicks but kills revenue, you’ve just built a statistically significant failure. 🎉
Your Key Performance Indicators (KPIs) should match business value, not vanity metrics.
| Bad KPI | Good KPI |
|---|---|
| Page Views | Purchase Rate |
| Clicks on “Learn More” | Completed Transactions |
| App Opens | Retention After 30 Days |
| Email Sent | Conversion to Paid Plan |
🧮 Basic Setup: The Scientific (and Sassy) Way¶
Step 1: Define the Hypothesis¶
“Changing X will improve Y.”
Example:
“If we make the ‘Buy Now’ button red instead of green, conversions will increase by 10%.”
(Note: If your designer says “Let’s just try it,” make them write the hypothesis in blood. 🩸)
Step 2: Random Assignment¶
Use random sampling to split users into groups:
Group A: Control (the original)
Group B: Treatment (the new shiny version)
import numpy as np
n = 10000
users = np.arange(n)
np.random.shuffle(users)
A, B = users[:n//2], users[n//2:]No cherry-picking. No “VIP users go to A.” Randomness is your shield against corporate bias.
Step 3: Measure the KPI¶
Let’s pretend we’re testing purchase rate:
conversion_A = np.random.binomial(1, 0.10, len(A))
conversion_B = np.random.binomial(1, 0.12, len(B))Step 4: Statistical Significance (a.k.a. “Is It Actually Better?”)¶
You can’t just feel the difference — you have to prove it with a t-test or z-test.
from statsmodels.stats.proportion import proportions_ztest
count = [conversion_B.sum(), conversion_A.sum()]
nobs = [len(conversion_B), len(conversion_A)]
stat, pval = proportions_ztest(count, nobs)
print(f"p-value: {pval:.4f}")If p < 0.05: Congratulations! 🎉
You’ve reached statistical significance (a.k.a. “It’s probably not luck”).
If not — sorry, it’s back to PowerPoint excuses.
Step 5: Run Time Matters!¶
Too short? → Results are random noise.
Too long? → You’re basically time-traveling through user behavior.
Rule of thumb: Run until you have statistical power to detect your target effect size.
Use tools like:
pip install statsmodelsand calculate power with:
from statsmodels.stats.power import NormalIndPower
NormalIndPower().solve_power(effect_size=0.1, power=0.8, alpha=0.05)📈 Multi-KPI Madness (Welcome to Real Business)¶
In real life, your test affects multiple things:
Conversions 🛒
Time on site ⏱️
Support tickets 😭
Brand reputation 💅
So align A/B test design with business KPIs, not just what’s easy to measure.
“You can’t optimize revenue by only measuring clicks.”
⚠️ Common A/B Testing Crimes¶
| Crime | Sentence |
|---|---|
| Peeking at results early | Death by p-value inflation |
| Not randomizing groups | Eternal bias in reports |
| Ignoring seasonality | Monthly executive whiplash |
| Using too many variants | “C” wins, but you forgot why |
| Declaring success at p=0.09 | Data jail (no parole) |
🧠 Bonus: Bayesian A/B Testing¶
If you’re feeling fancy, go Bayesian. It tells you probabilities instead of “p-values,” so you can say things like:
“There’s an 85% chance version B is better.”
Use libraries like pymc, bayespy, or arviz.
🧰 Tools You Should Know¶
| Tool | What It Does |
|---|---|
| Optimizely | Drag-and-drop web A/B testing |
| Google Optimize (RIP) | Gone but not forgotten 😢 |
| Statsmodels | Frequentist testing |
| PyMC / ArviZ | Bayesian inference |
| Evidently AI | Monitors post-deployment metrics |
💬 Business Takeaway¶
A/B testing is not about “winning versions.” It’s about making data-driven trade-offs that align with company goals.
So the next time your boss says,
“Let’s test changing the font size,”
Ask:
“Sure. But what’s the business metric we’re optimizing?”
That’s how you go from “data nerd” to “strategic data leader.” 😎
# Your code here