“Because sometimes, predicting who clicks an ad is harder than convincing your boss why you need more GPUs.” 😅
🎯 Objective¶
You’re the new data scientist at AdAstra Marketing Co. 🚀 The marketing team wants to predict which customers are most likely to respond to a campaign, so they can stop wasting money on people who think “unsubscribe” means “tell me more.”
Your job:
Build, tune, and evaluate multiple ML models — then choose the one that makes the most business sense, not just the highest accuracy.
🧩 What You’ll Practice¶
✅ Model comparison with Cross-Validation and Nested CV ✅ Hyperparameter tuning using GridSearchCV / RandomizedSearchCV ✅ Translating model performance → ROI, lift, and profit curves ✅ Choosing the best campaign target group based on predicted response probability
🗂️ Dataset Overview¶
Use the dataset: marketing_campaign.csv
(available in /data folder or downloadable from Colab link below)
| Feature | Description |
|---|---|
age | Customer age |
income | Annual income ($) |
spending_score | Prior engagement with campaigns |
channel | Campaign channel (email, social, etc.) |
response | 1 if responded positively, else 0 |
🧰 Setup & Notebook Access¶
You can run this lab directly in:
| Option | Link |
|---|---|
| 🧮 JupyterLite (in-browser) | Run in JupyterLite ▶️ |
| ☁️ Google Colab | Open in Colab 🚀 |
| 💾 Download Notebook | Download .ipynb |
🧠 Step-by-Step Instructions¶
1. Load & Explore Data¶
import pandas as pd
df = pd.read_csv("marketing_campaign.csv")
df.head()Check data types, missing values, and basic stats. Don’t forget to ask the existential question:
“Why is income missing for half our audience?” 💸
2. Preprocess¶
Encode categorical features (
channel)Handle missing income with median imputation
Scale continuous variables
Bonus Challenge: Try both
StandardScalerandMinMaxScaler— and see if your model’s mood improves. 😆
3. Split Data & Define Models¶
Use a Stratified Train-Test Split to preserve response balance.
Try multiple models:
Logistic Regression
Random Forest
XGBoost (optional but fun 💥)
4. Evaluate with Cross-Validation¶
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(random_state=42)
scores = cross_val_score(rf, X, y, cv=5, scoring="f1")
print("Average F1:", scores.mean())💡 Tip: Keep a leaderboard of all models — treat it like a data science version of The Bachelor 💔🌹
5. Hyperparameter Tuning¶
Use GridSearchCV or RandomizedSearchCV to find the sweet spot.
| Model | Key Hyperparameters |
|---|---|
| Logistic Regression | C, penalty |
| Random Forest | n_estimators, max_depth, min_samples_split |
| XGBoost | eta, max_depth, subsample |
Don’t overfit — the goal is ROI, not ego points. 😎
6. Business-Aware Evaluation¶
Compute ROI, lift, and expected profit for each model.
Example profit function:
def campaign_profit(y_true, y_pred_proba, threshold=0.5):
TP_gain, FP_cost, FN_cost = 100, 10, 50
preds = (y_pred_proba >= threshold).astype(int)
TP = ((preds == 1) & (y_true == 1)).sum()
FP = ((preds == 1) & (y_true == 0)).sum()
FN = ((preds == 0) & (y_true == 1)).sum()
return TP*TP_gain - FP*FP_cost - FN*FN_costTry different thresholds to find the profit-maximizing one, not just where F1 is max. 💰
7. Visualize Profit Curve¶
import numpy as np
import matplotlib.pyplot as plt
thresholds = np.linspace(0, 1, 100)
profits = [campaign_profit(y_test, y_pred_proba, t) for t in thresholds]
plt.plot(thresholds, profits)
plt.xlabel("Threshold")
plt.ylabel("Expected Profit ($)")
plt.title("Optimize for 💵, not just F1")
plt.show()8. Interpret & Present¶
Explain results to your “boss” (or pretend one 🧑💼):
What’s the best model and why?
What threshold gives the best ROI?
Which features drive campaign response?
Pro tip: Use visuals — executives fear math but love graphs. 📊❤️
🧪 Stretch Goal¶
Deploy a threshold-based segmenter:
Top 10% predicted responders = Premium segment
Next 30% = Target later
Rest = Send memes instead of ads 😜
🎓 Deliverables¶
Notebook with:
Model training + tuning
ROI / lift plots
Final model comparison table
Short business summary:
“This model improves campaign ROI by 18% with 40% fewer promotions sent.”
💬 Final Thought¶
“In marketing, knowing who not to target is often worth more than finding who to target.” 💡
# Your code here