Backtesting & KPIs#

“Because the only thing worse than a bad forecast is not knowing it’s bad.” 😬


🎯 Why Backtesting?#

Imagine your forecasting model as a fortune teller. 🔮 You wouldn’t just trust their prediction that your Q4 sales will skyrocket without proof, right? Backtesting is how we test our fortune teller — by asking:

“Okay, smartypants, what would you have predicted last year?”

If their “forecast” doesn’t match what actually happened — we politely say, “You’re fired,” and try again.


🧠 The Big Idea#

Backtesting = pretend the past is the future, make a prediction, and see how wrong you were.

In code terms:

  1. Split your time series into train and test sets.

  2. Train your model on the earlier part.

  3. Predict the later part.

  4. Compare the predictions vs. reality.

  5. Cry a little. Adjust parameters. Repeat. 🌀


📊 Basic Backtesting in Python#

Here’s how the data therapist session goes:

from sklearn.metrics import mean_absolute_error
from prophet import Prophet
import pandas as pd

# Split data
train = df.iloc[:-12]
test = df.iloc[-12:]

# Train model
model = Prophet()
model.fit(train)

# Forecast into test period
future = model.make_future_dataframe(periods=12, freq='M')
forecast = model.predict(future)

# Compare
preds = forecast.set_index('ds').loc[test['ds'], 'yhat']
mae = mean_absolute_error(test['y'], preds)

print(f"Mean Absolute Error: {mae:.2f}")

📉 Output:

Mean Absolute Error: 57.23

Translation: your model missed the target by about 57 units per month. Not catastrophic… but CFO might still send you “that” email.


🧪 Rolling Backtest (Walk-Forward Validation)#

One test isn’t enough — let’s simulate multiple points in time.

errors = []
for i in range(6, 13):
    train = df.iloc[:-i]
    test = df.iloc[-i:-i+1]

    model = Prophet().fit(train)
    future = model.make_future_dataframe(periods=i, freq='M')
    forecast = model.predict(future)
    y_pred = forecast.iloc[-1]['yhat']
    errors.append(abs(test['y'].values[0] - y_pred))

print(f"Average Error: {sum(errors)/len(errors):.2f}")

🧮 It’s like asking Prophet,

“What if you had been alive in 2019, 2020, 2021… how would you have done?”


📈 KPI Metrics You Should Know#

Metric

Formula

Business Translation

MAE

Mean Absolute Error

Average “ouch” per prediction

RMSE

Root Mean Squared Error

Like MAE, but penalizes bigger mistakes

MAPE

Mean Absolute % Error

“On average, how far off was I in percentage terms?”

Coefficient of Determination

“How much of reality did I actually explain?”

💬 Tip: MAPE is great for business decks — it turns abstract errors into something managers understand:

“Our forecast is 7% off, not 700 widgets.”


🧮 KPI Interpretation – “How Wrong Is Acceptably Wrong?”#

Accuracy

Description

Manager’s Reaction

< 5%

🔥 Excellent

“You’re getting a promotion!”

5–10%

👍 Good

“Let’s put this in the report.”

10–20%

😐 Acceptable

“Hmm, close enough for planning.”

> 20%

🚨 Bad

“We’ll blame marketing again.”


🪞 Backtesting in Business Terms#

Forecasting sales? Backtesting tells you how much inventory you should have ordered vs. what you actually needed.

Forecasting website traffic? It tells marketing how many ads they wasted money on. 💸

Forecasting stock levels? It saves your warehouse from drowning in 10,000 unsold “Summer 2022” mugs.


💼 KPI Alignment with Business Goals#

Business Goal

Forecast Metric

KPI Alignment

Profit Planning

RMSE

Minimizing overall uncertainty

Inventory

MAE or MAPE

Fewer overstock/understock events

Marketing ROI

Model explains real demand shifts

Finance Budgeting

MAPE

Predictability over perfection


🧠 Practice Challenge#

Try backtesting Prophet or ARIMA on:

  • Monthly sales or revenue

  • Customer support tickets

  • Website visits

Compute MAE, RMSE, and MAPE — then answer:

“Would I trust this forecast in a board meeting?”


🧾 TL;DR#

Concept

TL;DR

Backtesting

Testing your forecast on old data

KPIs

Quantify how wrong (or right) you were

Rolling test

Multiple points of validation

MAPE

Business-friendly accuracy score

RMSE

Punishes big errors

Business takeaway

Forecasts are only useful when you measure their reliability


“Backtesting is like checking your ex’s old messages — you might not like what you find, but it teaches you what to avoid next time.” 💔📉

# Your code here