Backtesting & KPIs - Machine Learning for Business

“Because the only thing worse than a bad forecast is not knowing it’s bad.” 😬

🎯 Why Backtesting?¶

Imagine your forecasting model as a fortune teller. 🔮 You wouldn’t just trust their prediction that your Q4 sales will skyrocket without proof, right? Backtesting is how we test our fortune teller — by asking:

“Okay, smartypants, what would you have predicted last year?”

If their “forecast” doesn’t match what actually happened — we politely say, “You’re fired,” and try again.

🧠 The Big Idea¶

Backtesting = pretend the past is the future, make a prediction, and see how wrong you were.

In code terms:

Split your time series into train and test sets.
Train your model on the earlier part.
Predict the later part.
Compare the predictions vs. reality.
Cry a little. Adjust parameters. Repeat. 🌀

📊 Basic Backtesting in Python¶

Here’s how the data therapist session goes:

from sklearn.metrics import mean_absolute_error
from prophet import Prophet
import pandas as pd

# Split data
train = df.iloc[:-12]
test = df.iloc[-12:]

# Train model
model = Prophet()
model.fit(train)

# Forecast into test period
future = model.make_future_dataframe(periods=12, freq='M')
forecast = model.predict(future)

# Compare
preds = forecast.set_index('ds').loc[test['ds'], 'yhat']
mae = mean_absolute_error(test['y'], preds)

print(f"Mean Absolute Error: {mae:.2f}")

📉 Output:

Mean Absolute Error: 57.23

Translation: your model missed the target by about 57 units per month. Not catastrophic… but CFO might still send you “that” email.

🧪 Rolling Backtest (Walk-Forward Validation)¶

One test isn’t enough — let’s simulate multiple points in time.

errors = []
for i in range(6, 13):
    train = df.iloc[:-i]
    test = df.iloc[-i:-i+1]

    model = Prophet().fit(train)
    future = model.make_future_dataframe(periods=i, freq='M')
    forecast = model.predict(future)
    y_pred = forecast.iloc[-1]['yhat']
    errors.append(abs(test['y'].values[0] - y_pred))

print(f"Average Error: {sum(errors)/len(errors):.2f}")

🧮 It’s like asking Prophet,

“What if you had been alive in 2019, 2020, 2021… how would you have done?”

📈 KPI Metrics You Should Know¶

Metric	Formula	Business Translation
MAE	Mean Absolute Error	Average “ouch” per prediction
RMSE	Root Mean Squared Error	Like MAE, but penalizes bigger mistakes
MAPE	Mean Absolute % Error	“On average, how far off was I in percentage terms?”
R²	Coefficient of Determination	“How much of reality did I actually explain?”

💬 Tip: MAPE is great for business decks — it turns abstract errors into something managers understand:

“Our forecast is 7% off, not 700 widgets.”

🧮 KPI Interpretation – “How Wrong Is Acceptably Wrong?”¶

Accuracy	Description	Manager’s Reaction
< 5%	🔥 Excellent	“You’re getting a promotion!”
5–10%	👍 Good	“Let’s put this in the report.”
10–20%	😐 Acceptable	“Hmm, close enough for planning.”
> 20%	🚨 Bad	“We’ll blame marketing again.”

🪞 Backtesting in Business Terms¶

Forecasting sales? Backtesting tells you how much inventory you should have ordered vs. what you actually needed.

Forecasting website traffic? It tells marketing how many ads they wasted money on. 💸

Forecasting stock levels? It saves your warehouse from drowning in 10,000 unsold “Summer 2022” mugs.

💼 KPI Alignment with Business Goals¶

Business Goal	Forecast Metric	KPI Alignment
Profit Planning	RMSE	Minimizing overall uncertainty
Inventory	MAE or MAPE	Fewer overstock/understock events
Marketing ROI	R²	Model explains real demand shifts
Finance Budgeting	MAPE	Predictability over perfection

🧠 Practice Challenge¶

Try backtesting Prophet or ARIMA on:

Monthly sales or revenue
Customer support tickets
Website visits

Compute MAE, RMSE, and MAPE — then answer:

“Would I trust this forecast in a board meeting?”

🧾 TL;DR¶

Concept	TL;DR
Backtesting	Testing your forecast on old data
KPIs	Quantify how wrong (or right) you were
Rolling test	Multiple points of validation
MAPE	Business-friendly accuracy score
RMSE	Punishes big errors
Business takeaway	Forecasts are only useful when you measure their reliability

“Backtesting is like checking your ex’s old messages — you might not like what you find, but it teaches you what to avoid next time.” 💔📉

# Your code here