Lab – Sales Forecasting#
Where data meets drama, and revenue meets regression! 😎
🎬 The Business Scenario#
Congratulations — you’ve just been promoted to Data Science Intern of the Year at Acme Retail Co. 🎉
Your boss (who once said “AI is just fancy Excel”) wants you to predict monthly sales using marketing spend, pricing, and seasonal factors.
Your mission:
Build a simple but effective regression model to forecast sales, explain it clearly, and make it look fancy on a dashboard so everyone thinks it’s magic. ✨
🧾 Step 1. Load the Data#
You’ve been handed a “beautifully messy” Excel file by the finance team (of course 🙃).
import pandas as pd
url = "https://raw.githubusercontent.com/chandraveshchaudhari/datasets/main/retail_sales.csv"
data = pd.read_csv(url)
data.head()
📊 Expected columns:
MonthTV_SpendSocial_Media_SpendDiscount_PercentSeasonSales
🧹 Step 2. Clean the Data#
Finance swore the data was “clean.” You’ll find out soon enough.
data.info()
data.describe()
data.isnull().sum()
🧽 Handle missing values or weird outliers:
data = data.dropna()
data = data[data["Sales"] > 0]
✅ Tip: Don’t delete too much — remember, data is like gossip; even the noisy parts tell a story. 😏
🔍 Step 3. Feature Engineering#
Convert categorical Season into numerical features:
data = pd.get_dummies(data, columns=["Season"], drop_first=True)
data.head()
Normalize ad spend (optional but helps training):
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = data.drop("Sales", axis=1)
X_scaled = scaler.fit_transform(X)
y = data["Sales"]
⚙️ Step 4. Split Data#
Time to create our train-test split — because life is all about testing your assumptions.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
🧠 Step 5. Train the Model#
Let’s start simple — just a Linear Regression:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
Check the coefficients (aka “how much each feature drives sales”):
coeff_df = pd.DataFrame({
"Feature": X.columns,
"Coefficient": model.coef_.round(2)
})
coeff_df
📈 Translation: For every $1 increase in TV spend, sales increase by X units (if your CFO is reading this, round the number dramatically). 😅
📉 Step 6. Evaluate Performance#
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
y_pred = model.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"MAE: {mae:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"R² Score: {r2:.2f}")
✅ Interpretation:
MAE = average error in prediction
RMSE = punishes big mistakes more
R² = model’s bragging rights (1.0 = perfection, 0.0 = chaos)
📊 Step 7. Visualize Predictions#
Let’s make the boss happy with a plot that looks “AI-powered.” 🤖
import matplotlib.pyplot as plt
plt.figure(figsize=(8, 5))
plt.scatter(y_test, y_pred, alpha=0.7, color='royalblue')
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], '--', color='orange')
plt.title("Actual vs Predicted Sales")
plt.xlabel("Actual Sales")
plt.ylabel("Predicted Sales")
plt.show()
🎨 If points hug the orange line → your model is doing great!
💬 Step 8. Business Insights#
Feature |
Effect |
Business Advice |
|---|---|---|
TV_Spend |
+ve |
Ads still work! Old-school isn’t dead 📺 |
Social_Media_Spend |
+ve |
Keep feeding the algorithm 💰 |
Discount_Percent |
-ve (sometimes) |
Too many discounts hurt profit margin 💸 |
Season_Summer |
+ve |
Ice creams sell better in heat 😎 |
💡 “A regression model is just a data-driven way to say ‘I told you so’ in a meeting.”
📈 Step 9. Bonus: Predict Future Sales#
You can predict next month’s sales like a forecasting wizard:
next_month = [[3000, 1000, 10, 0, 1, 0]] # Example: custom input
predicted_sales = model.predict(next_month)
print(f"Predicted Sales: ${predicted_sales[0]:.2f}")
Now go make that dashboard and act mysterious when someone asks,
“So… how does it work?” 😏
💡 Optional Extension#
Try the same task using:
Ridge or Lasso Regression
PolynomialFeatures
or even a Random Forest (coming soon in later chapters 🌲)
Then compare R² — because competition keeps models humble.
🧩 Practice Exercises#
Challenge |
Hint |
|---|---|
1️⃣ Train a Ridge Regression model and compare its R² |
|
2️⃣ Add a new feature |
Think creatively! |
3️⃣ Create a seaborn plot of coefficients |
Use |
4️⃣ Try predicting sales when |
Does the model behave logically? |
5️⃣ Deploy your model to Colab and make it interactive |
Add widgets or sliders! |
🧠 Recap#
You loaded, cleaned, and prepped real data
Built a regression model
Evaluated business KPIs with metrics
Made it explainable and visually appealing
You’re now a certified Sales Forecast Whisperer 📊🧙♂️
🐍 Python Help#
If you’re unsure about any pandas or sklearn syntax, 🪄 explore Programming for Business — it’s your friendly guide to all the Python basics behind the magic!
🚀 Next Up#
➡️ Chapter 6: Classification Models Because predicting “how much” is cool — but predicting “who buys” is where real marketing power begins. 🎯
# Your code here