Backtesting & KPIs#
“Because the only thing worse than a bad forecast is not knowing it’s bad.” 😬
🎯 Why Backtesting?#
Imagine your forecasting model as a fortune teller. 🔮 You wouldn’t just trust their prediction that your Q4 sales will skyrocket without proof, right? Backtesting is how we test our fortune teller — by asking:
“Okay, smartypants, what would you have predicted last year?”
If their “forecast” doesn’t match what actually happened — we politely say, “You’re fired,” and try again.
🧠 The Big Idea#
Backtesting = pretend the past is the future, make a prediction, and see how wrong you were.
In code terms:
Split your time series into train and test sets.
Train your model on the earlier part.
Predict the later part.
Compare the predictions vs. reality.
Cry a little. Adjust parameters. Repeat. 🌀
📊 Basic Backtesting in Python#
Here’s how the data therapist session goes:
from sklearn.metrics import mean_absolute_error
from prophet import Prophet
import pandas as pd
# Split data
train = df.iloc[:-12]
test = df.iloc[-12:]
# Train model
model = Prophet()
model.fit(train)
# Forecast into test period
future = model.make_future_dataframe(periods=12, freq='M')
forecast = model.predict(future)
# Compare
preds = forecast.set_index('ds').loc[test['ds'], 'yhat']
mae = mean_absolute_error(test['y'], preds)
print(f"Mean Absolute Error: {mae:.2f}")
📉 Output:
Mean Absolute Error: 57.23
Translation: your model missed the target by about 57 units per month. Not catastrophic… but CFO might still send you “that” email.
🧪 Rolling Backtest (Walk-Forward Validation)#
One test isn’t enough — let’s simulate multiple points in time.
errors = []
for i in range(6, 13):
train = df.iloc[:-i]
test = df.iloc[-i:-i+1]
model = Prophet().fit(train)
future = model.make_future_dataframe(periods=i, freq='M')
forecast = model.predict(future)
y_pred = forecast.iloc[-1]['yhat']
errors.append(abs(test['y'].values[0] - y_pred))
print(f"Average Error: {sum(errors)/len(errors):.2f}")
🧮 It’s like asking Prophet,
“What if you had been alive in 2019, 2020, 2021… how would you have done?”
📈 KPI Metrics You Should Know#
Metric |
Formula |
Business Translation |
|---|---|---|
MAE |
Mean Absolute Error |
Average “ouch” per prediction |
RMSE |
Root Mean Squared Error |
Like MAE, but penalizes bigger mistakes |
MAPE |
Mean Absolute % Error |
“On average, how far off was I in percentage terms?” |
R² |
Coefficient of Determination |
“How much of reality did I actually explain?” |
💬 Tip: MAPE is great for business decks — it turns abstract errors into something managers understand:
“Our forecast is 7% off, not 700 widgets.”
🧮 KPI Interpretation – “How Wrong Is Acceptably Wrong?”#
Accuracy |
Description |
Manager’s Reaction |
|---|---|---|
< 5% |
🔥 Excellent |
“You’re getting a promotion!” |
5–10% |
👍 Good |
“Let’s put this in the report.” |
10–20% |
😐 Acceptable |
“Hmm, close enough for planning.” |
> 20% |
🚨 Bad |
“We’ll blame marketing again.” |
🪞 Backtesting in Business Terms#
Forecasting sales? Backtesting tells you how much inventory you should have ordered vs. what you actually needed.
Forecasting website traffic? It tells marketing how many ads they wasted money on. 💸
Forecasting stock levels? It saves your warehouse from drowning in 10,000 unsold “Summer 2022” mugs.
💼 KPI Alignment with Business Goals#
Business Goal |
Forecast Metric |
KPI Alignment |
|---|---|---|
Profit Planning |
RMSE |
Minimizing overall uncertainty |
Inventory |
MAE or MAPE |
Fewer overstock/understock events |
Marketing ROI |
R² |
Model explains real demand shifts |
Finance Budgeting |
MAPE |
Predictability over perfection |
🧠 Practice Challenge#
Try backtesting Prophet or ARIMA on:
Monthly sales or revenue
Customer support tickets
Website visits
Compute MAE, RMSE, and MAPE — then answer:
“Would I trust this forecast in a board meeting?”
🧾 TL;DR#
Concept |
TL;DR |
|---|---|
Backtesting |
Testing your forecast on old data |
KPIs |
Quantify how wrong (or right) you were |
Rolling test |
Multiple points of validation |
MAPE |
Business-friendly accuracy score |
RMSE |
Punishes big errors |
Business takeaway |
Forecasts are only useful when you measure their reliability |
“Backtesting is like checking your ex’s old messages — you might not like what you find, but it teaches you what to avoid next time.” 💔📉
# Your code here