Lab – Sales Forecasting#

Where data meets drama, and revenue meets regression! 😎


🎬 The Business Scenario#

Congratulations — you’ve just been promoted to Data Science Intern of the Year at Acme Retail Co. 🎉

Your boss (who once said “AI is just fancy Excel”) wants you to predict monthly sales using marketing spend, pricing, and seasonal factors.

Your mission:

Build a simple but effective regression model to forecast sales, explain it clearly, and make it look fancy on a dashboard so everyone thinks it’s magic. ✨


🧾 Step 1. Load the Data#

You’ve been handed a “beautifully messy” Excel file by the finance team (of course 🙃).

import pandas as pd

url = "https://raw.githubusercontent.com/chandraveshchaudhari/datasets/main/retail_sales.csv"
data = pd.read_csv(url)

data.head()

📊 Expected columns:

  • Month

  • TV_Spend

  • Social_Media_Spend

  • Discount_Percent

  • Season

  • Sales


🧹 Step 2. Clean the Data#

Finance swore the data was “clean.” You’ll find out soon enough.

data.info()
data.describe()
data.isnull().sum()

🧽 Handle missing values or weird outliers:

data = data.dropna()
data = data[data["Sales"] > 0]

Tip: Don’t delete too much — remember, data is like gossip; even the noisy parts tell a story. 😏


🔍 Step 3. Feature Engineering#

Convert categorical Season into numerical features:

data = pd.get_dummies(data, columns=["Season"], drop_first=True)
data.head()

Normalize ad spend (optional but helps training):

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X = data.drop("Sales", axis=1)
X_scaled = scaler.fit_transform(X)
y = data["Sales"]

⚙️ Step 4. Split Data#

Time to create our train-test split — because life is all about testing your assumptions.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

🧠 Step 5. Train the Model#

Let’s start simple — just a Linear Regression:

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

Check the coefficients (aka “how much each feature drives sales”):

coeff_df = pd.DataFrame({
    "Feature": X.columns,
    "Coefficient": model.coef_.round(2)
})
coeff_df

📈 Translation: For every $1 increase in TV spend, sales increase by X units (if your CFO is reading this, round the number dramatically). 😅


📉 Step 6. Evaluate Performance#

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

y_pred = model.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)

print(f"MAE: {mae:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"R² Score: {r2:.2f}")

Interpretation:

  • MAE = average error in prediction

  • RMSE = punishes big mistakes more

  • = model’s bragging rights (1.0 = perfection, 0.0 = chaos)


📊 Step 7. Visualize Predictions#

Let’s make the boss happy with a plot that looks “AI-powered.” 🤖

import matplotlib.pyplot as plt

plt.figure(figsize=(8, 5))
plt.scatter(y_test, y_pred, alpha=0.7, color='royalblue')
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], '--', color='orange')
plt.title("Actual vs Predicted Sales")
plt.xlabel("Actual Sales")
plt.ylabel("Predicted Sales")
plt.show()

🎨 If points hug the orange line → your model is doing great!


💬 Step 8. Business Insights#

Feature

Effect

Business Advice

TV_Spend

+ve

Ads still work! Old-school isn’t dead 📺

Social_Media_Spend

+ve

Keep feeding the algorithm 💰

Discount_Percent

-ve (sometimes)

Too many discounts hurt profit margin 💸

Season_Summer

+ve

Ice creams sell better in heat 😎

💡 “A regression model is just a data-driven way to say ‘I told you so’ in a meeting.”


📈 Step 9. Bonus: Predict Future Sales#

You can predict next month’s sales like a forecasting wizard:

next_month = [[3000, 1000, 10, 0, 1, 0]]  # Example: custom input
predicted_sales = model.predict(next_month)
print(f"Predicted Sales: ${predicted_sales[0]:.2f}")

Now go make that dashboard and act mysterious when someone asks,

“So… how does it work?” 😏


💡 Optional Extension#

Try the same task using:

  • Ridge or Lasso Regression

  • PolynomialFeatures

  • or even a Random Forest (coming soon in later chapters 🌲)

Then compare R² — because competition keeps models humble.


🧩 Practice Exercises#

Challenge

Hint

1️⃣ Train a Ridge Regression model and compare its R²

from sklearn.linear_model import Ridge

2️⃣ Add a new feature Price_Per_Unit = Sales / Discount_Percent

Think creatively!

3️⃣ Create a seaborn plot of coefficients

Use sns.barplot()

4️⃣ Try predicting sales when Discount_Percent = 0

Does the model behave logically?

5️⃣ Deploy your model to Colab and make it interactive

Add widgets or sliders!


🧠 Recap#

  • You loaded, cleaned, and prepped real data

  • Built a regression model

  • Evaluated business KPIs with metrics

  • Made it explainable and visually appealing

You’re now a certified Sales Forecast Whisperer 📊🧙‍♂️


🐍 Python Help#

If you’re unsure about any pandas or sklearn syntax, 🪄 explore Programming for Business — it’s your friendly guide to all the Python basics behind the magic!


🚀 Next Up#

➡️ Chapter 6: Classification Models Because predicting “how much” is cool — but predicting “who buys” is where real marketing power begins. 🎯

# Your code here