A/B Testing & KPI Alignment#

“Because Sometimes the ‘Better’ Version Isn’t Actually Better”#


💬 “We changed the button color to green and sales went up 2%! We’re geniuses!”

— Every product manager before realizing it was just Tuesday payday traffic


🧪 What Is A/B Testing?#

A/B testing is basically the scientific method for marketers — you take two (or more) versions of something, show them to different groups, and see which one actually performs better.

It’s like comparing:

  • A: “Old boring landing page”

  • B: “New fancy landing page with dancing llama GIFs” 🦙✨

Then you measure which gets more clicks, conversions, or complaints.


🎨 Why You Should Care About KPI Alignment#

Because if your A/B test improves clicks but kills revenue, you’ve just built a statistically significant failure. 🎉

Your Key Performance Indicators (KPIs) should match business value, not vanity metrics.

Bad KPI

Good KPI

Page Views

Purchase Rate

Clicks on “Learn More”

Completed Transactions

App Opens

Retention After 30 Days

Email Sent

Conversion to Paid Plan


🧮 Basic Setup: The Scientific (and Sassy) Way#

Step 1: Define the Hypothesis#

“Changing X will improve Y.”

Example:

“If we make the ‘Buy Now’ button red instead of green, conversions will increase by 10%.”

(Note: If your designer says “Let’s just try it,” make them write the hypothesis in blood. 🩸)


Step 2: Random Assignment#

Use random sampling to split users into groups:

  • Group A: Control (the original)

  • Group B: Treatment (the new shiny version)

import numpy as np
n = 10000
users = np.arange(n)
np.random.shuffle(users)
A, B = users[:n//2], users[n//2:]

No cherry-picking. No “VIP users go to A.” Randomness is your shield against corporate bias.


Step 3: Measure the KPI#

Let’s pretend we’re testing purchase rate:

conversion_A = np.random.binomial(1, 0.10, len(A))
conversion_B = np.random.binomial(1, 0.12, len(B))

Step 4: Statistical Significance (a.k.a. “Is It Actually Better?”)#

You can’t just feel the difference — you have to prove it with a t-test or z-test.

from statsmodels.stats.proportion import proportions_ztest

count = [conversion_B.sum(), conversion_A.sum()]
nobs = [len(conversion_B), len(conversion_A)]

stat, pval = proportions_ztest(count, nobs)
print(f"p-value: {pval:.4f}")

If p < 0.05: Congratulations! 🎉 You’ve reached statistical significance (a.k.a. “It’s probably not luck”). If not — sorry, it’s back to PowerPoint excuses.


Step 5: Run Time Matters!#

  • Too short? → Results are random noise.

  • Too long? → You’re basically time-traveling through user behavior.

  • Rule of thumb: Run until you have statistical power to detect your target effect size.

Use tools like:

pip install statsmodels

and calculate power with:

from statsmodels.stats.power import NormalIndPower
NormalIndPower().solve_power(effect_size=0.1, power=0.8, alpha=0.05)

📈 Multi-KPI Madness (Welcome to Real Business)#

In real life, your test affects multiple things:

  • Conversions 🛒

  • Time on site ⏱️

  • Support tickets 😭

  • Brand reputation 💅

So align A/B test design with business KPIs, not just what’s easy to measure.

“You can’t optimize revenue by only measuring clicks.”


⚠️ Common A/B Testing Crimes#

Crime

Sentence

Peeking at results early

Death by p-value inflation

Not randomizing groups

Eternal bias in reports

Ignoring seasonality

Monthly executive whiplash

Using too many variants

“C” wins, but you forgot why

Declaring success at p=0.09

Data jail (no parole)


🧠 Bonus: Bayesian A/B Testing#

If you’re feeling fancy, go Bayesian. It tells you probabilities instead of “p-values,” so you can say things like:

“There’s an 85% chance version B is better.”

Use libraries like pymc, bayespy, or arviz.


🧰 Tools You Should Know#

Tool

What It Does

Optimizely

Drag-and-drop web A/B testing

Google Optimize (RIP)

Gone but not forgotten 😢

Statsmodels

Frequentist testing

PyMC / ArviZ

Bayesian inference

Evidently AI

Monitors post-deployment metrics


💬 Business Takeaway#

A/B testing is not about “winning versions.” It’s about making data-driven trade-offs that align with company goals.

So the next time your boss says,

“Let’s test changing the font size,”

Ask:

“Sure. But what’s the business metric we’re optimizing?”

That’s how you go from “data nerd” to “strategic data leader.” 😎

# Your code here