A/B Testing & KPI Alignment - Machine Learning for Business

“Because Sometimes the ‘Better’ Version Isn’t Actually Better”¶

💬 “We changed the button color to green and sales went up 2%! We’re geniuses!”

Every product manager before realizing it was just Tuesday payday traffic

🧪 What Is A/B Testing?¶

A/B testing is basically the scientific method for marketers — you take two (or more) versions of something, show them to different groups, and see which one actually performs better.

It’s like comparing:

A: “Old boring landing page”
B: “New fancy landing page with dancing llama GIFs” 🦙✨

Then you measure which gets more clicks, conversions, or complaints.

🎨 Why You Should Care About KPI Alignment¶

Because if your A/B test improves clicks but kills revenue, you’ve just built a statistically significant failure. 🎉

Your Key Performance Indicators (KPIs) should match business value, not vanity metrics.

Bad KPI	Good KPI
Page Views	Purchase Rate
Clicks on “Learn More”	Completed Transactions
App Opens	Retention After 30 Days
Email Sent	Conversion to Paid Plan

🧮 Basic Setup: The Scientific (and Sassy) Way¶

Step 1: Define the Hypothesis¶

“Changing X will improve Y.”

Example:

“If we make the ‘Buy Now’ button red instead of green, conversions will increase by 10%.”

(Note: If your designer says “Let’s just try it,” make them write the hypothesis in blood. 🩸)

Step 2: Random Assignment¶

Use random sampling to split users into groups:

Group A: Control (the original)
Group B: Treatment (the new shiny version)

import numpy as np
n = 10000
users = np.arange(n)
np.random.shuffle(users)
A, B = users[:n//2], users[n//2:]

No cherry-picking. No “VIP users go to A.” Randomness is your shield against corporate bias.

Step 3: Measure the KPI¶

Let’s pretend we’re testing purchase rate:

conversion_A = np.random.binomial(1, 0.10, len(A))
conversion_B = np.random.binomial(1, 0.12, len(B))

Step 4: Statistical Significance (a.k.a. “Is It Actually Better?”)¶

You can’t just feel the difference — you have to prove it with a t-test or z-test.

from statsmodels.stats.proportion import proportions_ztest

count = [conversion_B.sum(), conversion_A.sum()]
nobs = [len(conversion_B), len(conversion_A)]

stat, pval = proportions_ztest(count, nobs)
print(f"p-value: {pval:.4f}")

If p < 0.05: Congratulations! 🎉 You’ve reached statistical significance (a.k.a. “It’s probably not luck”). If not — sorry, it’s back to PowerPoint excuses.

Step 5: Run Time Matters!¶

Too short? → Results are random noise.
Too long? → You’re basically time-traveling through user behavior.
Rule of thumb: Run until you have statistical power to detect your target effect size.

Use tools like:

pip install statsmodels

and calculate power with:

from statsmodels.stats.power import NormalIndPower
NormalIndPower().solve_power(effect_size=0.1, power=0.8, alpha=0.05)

📈 Multi-KPI Madness (Welcome to Real Business)¶

In real life, your test affects multiple things:

Conversions 🛒
Time on site ⏱️
Support tickets 😭
Brand reputation 💅

So align A/B test design with business KPIs, not just what’s easy to measure.

“You can’t optimize revenue by only measuring clicks.”

⚠️ Common A/B Testing Crimes¶

Crime	Sentence
Peeking at results early	Death by p-value inflation
Not randomizing groups	Eternal bias in reports
Ignoring seasonality	Monthly executive whiplash
Using too many variants	“C” wins, but you forgot why
Declaring success at p=0.09	Data jail (no parole)

🧠 Bonus: Bayesian A/B Testing¶

If you’re feeling fancy, go Bayesian. It tells you probabilities instead of “p-values,” so you can say things like:

“There’s an 85% chance version B is better.”

Use libraries like pymc, bayespy, or arviz.

🧰 Tools You Should Know¶

Tool	What It Does
Optimizely	Drag-and-drop web A/B testing
Google Optimize (RIP)	Gone but not forgotten 😢
Statsmodels	Frequentist testing
PyMC / ArviZ	Bayesian inference
Evidently AI	Monitors post-deployment metrics

💬 Business Takeaway¶

A/B testing is not about “winning versions.” It’s about making data-driven trade-offs that align with company goals.

So the next time your boss says,

“Let’s test changing the font size,”

Ask:

“Sure. But what’s the business metric we’re optimizing?”

That’s how you go from “data nerd” to “strategic data leader.” 😎

# Your code here