Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

“Because Sometimes the ‘Better’ Version Isn’t Actually Better”


💬 “We changed the button color to green and sales went up 2%! We’re geniuses!”

Every product manager before realizing it was just Tuesday payday traffic


🧪 What Is A/B Testing?

A/B testing is basically the scientific method for marketers — you take two (or more) versions of something, show them to different groups, and see which one actually performs better.

It’s like comparing:

  • A: “Old boring landing page”

  • B: “New fancy landing page with dancing llama GIFs” 🦙✨

Then you measure which gets more clicks, conversions, or complaints.


🎨 Why You Should Care About KPI Alignment

Because if your A/B test improves clicks but kills revenue, you’ve just built a statistically significant failure. 🎉

Your Key Performance Indicators (KPIs) should match business value, not vanity metrics.

Bad KPIGood KPI
Page ViewsPurchase Rate
Clicks on “Learn More”Completed Transactions
App OpensRetention After 30 Days
Email SentConversion to Paid Plan

🧮 Basic Setup: The Scientific (and Sassy) Way

Step 1: Define the Hypothesis

“Changing X will improve Y.”

Example:

“If we make the ‘Buy Now’ button red instead of green, conversions will increase by 10%.”

(Note: If your designer says “Let’s just try it,” make them write the hypothesis in blood. 🩸)


Step 2: Random Assignment

Use random sampling to split users into groups:

  • Group A: Control (the original)

  • Group B: Treatment (the new shiny version)

import numpy as np
n = 10000
users = np.arange(n)
np.random.shuffle(users)
A, B = users[:n//2], users[n//2:]

No cherry-picking. No “VIP users go to A.” Randomness is your shield against corporate bias.


Step 3: Measure the KPI

Let’s pretend we’re testing purchase rate:

conversion_A = np.random.binomial(1, 0.10, len(A))
conversion_B = np.random.binomial(1, 0.12, len(B))

Step 4: Statistical Significance (a.k.a. “Is It Actually Better?”)

You can’t just feel the difference — you have to prove it with a t-test or z-test.

from statsmodels.stats.proportion import proportions_ztest

count = [conversion_B.sum(), conversion_A.sum()]
nobs = [len(conversion_B), len(conversion_A)]

stat, pval = proportions_ztest(count, nobs)
print(f"p-value: {pval:.4f}")

If p < 0.05: Congratulations! 🎉 You’ve reached statistical significance (a.k.a. “It’s probably not luck”). If not — sorry, it’s back to PowerPoint excuses.


Step 5: Run Time Matters!

  • Too short? → Results are random noise.

  • Too long? → You’re basically time-traveling through user behavior.

  • Rule of thumb: Run until you have statistical power to detect your target effect size.

Use tools like:

pip install statsmodels

and calculate power with:

from statsmodels.stats.power import NormalIndPower
NormalIndPower().solve_power(effect_size=0.1, power=0.8, alpha=0.05)

📈 Multi-KPI Madness (Welcome to Real Business)

In real life, your test affects multiple things:

  • Conversions 🛒

  • Time on site ⏱️

  • Support tickets 😭

  • Brand reputation 💅

So align A/B test design with business KPIs, not just what’s easy to measure.

“You can’t optimize revenue by only measuring clicks.”


⚠️ Common A/B Testing Crimes

CrimeSentence
Peeking at results earlyDeath by p-value inflation
Not randomizing groupsEternal bias in reports
Ignoring seasonalityMonthly executive whiplash
Using too many variants“C” wins, but you forgot why
Declaring success at p=0.09Data jail (no parole)

🧠 Bonus: Bayesian A/B Testing

If you’re feeling fancy, go Bayesian. It tells you probabilities instead of “p-values,” so you can say things like:

“There’s an 85% chance version B is better.”

Use libraries like pymc, bayespy, or arviz.


🧰 Tools You Should Know

ToolWhat It Does
OptimizelyDrag-and-drop web A/B testing
Google Optimize (RIP)Gone but not forgotten 😢
StatsmodelsFrequentist testing
PyMC / ArviZBayesian inference
Evidently AIMonitors post-deployment metrics

💬 Business Takeaway

A/B testing is not about “winning versions.” It’s about making data-driven trade-offs that align with company goals.

So the next time your boss says,

“Let’s test changing the font size,”

Ask:

“Sure. But what’s the business metric we’re optimizing?”

That’s how you go from “data nerd” to “strategic data leader.” 😎

# Your code here