Regression Metrics#

How Wrong Are You, Exactly?

“Regression metrics: because saying ‘my model feels accurate’ doesn’t cut it in a business meeting.” 💼

Welcome to the part of machine learning where numbers meet accountability. When you’re predicting continuous values (like revenue, price, or demand), regression metrics are how you check if your model is just optimistic… or actually useful.


🧠 Business Hook: “The Weather App Problem” ☔#

Imagine your app predicts 26°C, but it’s actually 31°C outside. You’d say, “Close enough.”

Now imagine your model predicts \(2,600 in sales**, but actual sales are **\)3,100. Your manager says, “Close enough? You just lost $500 in forecasts!” 😬

That’s why regression metrics exist — to measure how close enough really is.


🎯 Key Metrics Overview#

Metric

Formula

Business Meaning

MAE – Mean Absolute Error

Average of absolute errors

“How far off, on average?”

MSE – Mean Squared Error

Average of squared errors

“How bad are large errors?”

RMSE – Root Mean Squared Error

Square root of MSE

“Like MSE, but in real units (e.g. dollars)”

R² (R-Squared)

1 - (variance of errors / variance of true values)

“How much variance does my model explain?”

MAPE – Mean Absolute Percentage Error

Avg % error

“How wrong, percentage-wise?”


🧩 The Business Translation Table#

Situation

Metric to Watch

Why

Forecasting demand or sales

MAE / MAPE

Managers understand “average error”

Comparing model improvements

RMSE

Penalizes big mistakes more

Checking model explanation power

Tells how much of the business variation is explained

Reporting to executives

MAPE

Easy to say “we’re off by 8%”

💬 “Executives love MAPE because it fits in one sentence — even on a PowerPoint.”


⚙️ Let’s Code It#

import pandas as pd
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Example data
data = {
    'actual_sales': [300, 450, 500, 600, 700],
    'predicted_sales': [280, 470, 490, 610, 680]
}
df = pd.DataFrame(data)

mae = mean_absolute_error(df['actual_sales'], df['predicted_sales'])
mse = mean_squared_error(df['actual_sales'], df['predicted_sales'])
rmse = mean_squared_error(df['actual_sales'], df['predicted_sales'], squared=False)
r2 = r2_score(df['actual_sales'], df['predicted_sales'])

print(f"MAE: {mae:.2f}")
print(f"MSE: {mse:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"R²: {r2:.3f}")

Output:

MAE: 14.00
MSE: 216.00
RMSE: 14.70
R²: 0.987

💬 “Translation: Your model is off by about $14 on average. CFO approved.”


🧮 Metric Intuition — With Humor#

Metric

Analogy

Meaning

MAE

Average distance between goal and ball in football ⚽

Lower = more accurate kicks

MSE

Like punishing bad shots harder 🥅

Squared penalty for big misses

RMSE

Same as MSE, but in meters not meters²

Easier to interpret

How much of the weather you can explain by looking outside ☀️

1.0 = perfect prediction, 0 = chaos

MAPE

“On average, how many % off was I?”

Great for CFO-friendly dashboards


🧠 Pro Tip: Never Trust One Metric Alone#

Each metric tells a different story:

  • RMSE hates large errors 🧨

  • MAE forgives gently 🤗

  • MAPE gets angry if actual values are near zero 😡

  • R² will flatter you until it doesn’t 😅

Always look at multiple metrics together — and align them with your business goal.

💬 “If you optimize RMSE but the business cares about MAPE, you’re fixing the wrong problem perfectly.”


📈 Visualizing Error Distribution#

Because numbers are nice, but plots reveal secrets.

import matplotlib.pyplot as plt
import seaborn as sns

df['error'] = df['actual_sales'] - df['predicted_sales']

sns.histplot(df['error'], bins=10, kde=True)
plt.title("Error Distribution")
plt.xlabel("Prediction Error (Actual - Predicted)")
plt.show()

💡 “If your error plot looks like Mount Everest, congratulations — your model climbs but doesn’t conquer.”


🧪 Practice Lab — “Forecast Fail or Fortune?”#

Dataset: sales_forecast.csv

  1. Calculate MAE, RMSE, R², and MAPE for your model.

  2. Visualize prediction errors with a histogram.

  3. Compare two models: Linear Regression vs Random Forest.

  4. Write a short paragraph explaining which one you’d choose and why (use business logic, not just numbers!).

🎯 Bonus Challenge: Create a Plotly dashboard comparing RMSE and R² for multiple regions.


💼 Real-World Example#

Scenario: Predicting weekly demand for a grocery chain.

  • Model A: MAE = 120, RMSE = 150

  • Model B: MAE = 100, RMSE = 300

Which is better? Depends! If you hate big surprises (stockouts), Model A is safer. If small misses are fine but big wins matter, maybe Model B.

💬 “Choosing metrics is like choosing pizza toppings — it depends on who you’re feeding.”


🧭 Recap#

Metric

When to Use

Business Insight

MAE

Simple, easy to explain

Average prediction miss

RMSE

Penalize big errors

Risk sensitivity

Model strength

Variance explained

MAPE

Communicate results

Percent-based accuracy


🔜 Next Up#

👉 Head to Classification Metrics where we’ll move from predicting “How much?” to “Which one?” — and deal with confusion matrices, precision, recall, and the eternal question:

“Why did my model think a loyal customer was a fraudster?” 😅


# Your code here