Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

“Because the only thing worse than a bad forecast is not knowing it’s bad.” 😬


🎯 Why Backtesting?

Imagine your forecasting model as a fortune teller. 🔮 You wouldn’t just trust their prediction that your Q4 sales will skyrocket without proof, right? Backtesting is how we test our fortune teller — by asking:

“Okay, smartypants, what would you have predicted last year?”

If their “forecast” doesn’t match what actually happened — we politely say, “You’re fired,” and try again.


🧠 The Big Idea

Backtesting = pretend the past is the future, make a prediction, and see how wrong you were.

In code terms:

  1. Split your time series into train and test sets.

  2. Train your model on the earlier part.

  3. Predict the later part.

  4. Compare the predictions vs. reality.

  5. Cry a little. Adjust parameters. Repeat. 🌀


📊 Basic Backtesting in Python

Here’s how the data therapist session goes:

from sklearn.metrics import mean_absolute_error
from prophet import Prophet
import pandas as pd

# Split data
train = df.iloc[:-12]
test = df.iloc[-12:]

# Train model
model = Prophet()
model.fit(train)

# Forecast into test period
future = model.make_future_dataframe(periods=12, freq='M')
forecast = model.predict(future)

# Compare
preds = forecast.set_index('ds').loc[test['ds'], 'yhat']
mae = mean_absolute_error(test['y'], preds)

print(f"Mean Absolute Error: {mae:.2f}")

📉 Output:

Mean Absolute Error: 57.23

Translation: your model missed the target by about 57 units per month. Not catastrophic… but CFO might still send you “that” email.


🧪 Rolling Backtest (Walk-Forward Validation)

One test isn’t enough — let’s simulate multiple points in time.

errors = []
for i in range(6, 13):
    train = df.iloc[:-i]
    test = df.iloc[-i:-i+1]

    model = Prophet().fit(train)
    future = model.make_future_dataframe(periods=i, freq='M')
    forecast = model.predict(future)
    y_pred = forecast.iloc[-1]['yhat']
    errors.append(abs(test['y'].values[0] - y_pred))

print(f"Average Error: {sum(errors)/len(errors):.2f}")

🧮 It’s like asking Prophet,

“What if you had been alive in 2019, 2020, 2021… how would you have done?”


📈 KPI Metrics You Should Know

MetricFormulaBusiness Translation
MAEMean Absolute ErrorAverage “ouch” per prediction
RMSERoot Mean Squared ErrorLike MAE, but penalizes bigger mistakes
MAPEMean Absolute % Error“On average, how far off was I in percentage terms?”
Coefficient of Determination“How much of reality did I actually explain?”

💬 Tip: MAPE is great for business decks — it turns abstract errors into something managers understand:

“Our forecast is 7% off, not 700 widgets.”


🧮 KPI Interpretation – “How Wrong Is Acceptably Wrong?”

AccuracyDescriptionManager’s Reaction
< 5%🔥 Excellent“You’re getting a promotion!”
5–10%👍 Good“Let’s put this in the report.”
10–20%😐 Acceptable“Hmm, close enough for planning.”
> 20%🚨 Bad“We’ll blame marketing again.”

🪞 Backtesting in Business Terms

Forecasting sales? Backtesting tells you how much inventory you should have ordered vs. what you actually needed.

Forecasting website traffic? It tells marketing how many ads they wasted money on. 💸

Forecasting stock levels? It saves your warehouse from drowning in 10,000 unsold “Summer 2022” mugs.


💼 KPI Alignment with Business Goals

Business GoalForecast MetricKPI Alignment
Profit PlanningRMSEMinimizing overall uncertainty
InventoryMAE or MAPEFewer overstock/understock events
Marketing ROIModel explains real demand shifts
Finance BudgetingMAPEPredictability over perfection

🧠 Practice Challenge

Try backtesting Prophet or ARIMA on:

  • Monthly sales or revenue

  • Customer support tickets

  • Website visits

Compute MAE, RMSE, and MAPE — then answer:

“Would I trust this forecast in a board meeting?”


🧾 TL;DR

ConceptTL;DR
BacktestingTesting your forecast on old data
KPIsQuantify how wrong (or right) you were
Rolling testMultiple points of validation
MAPEBusiness-friendly accuracy score
RMSEPunishes big errors
Business takeawayForecasts are only useful when you measure their reliability

“Backtesting is like checking your ex’s old messages — you might not like what you find, but it teaches you what to avoid next time.” 💔📉

# Your code here