Lab – Campaign Targeting#
“Because sometimes, predicting who clicks an ad is harder than convincing your boss why you need more GPUs.” 😅
🎯 Objective#
You’re the new data scientist at AdAstra Marketing Co. 🚀 The marketing team wants to predict which customers are most likely to respond to a campaign, so they can stop wasting money on people who think “unsubscribe” means “tell me more.”
Your job:
Build, tune, and evaluate multiple ML models — then choose the one that makes the most business sense, not just the highest accuracy.
🧩 What You’ll Practice#
✅ Model comparison with Cross-Validation and Nested CV ✅ Hyperparameter tuning using GridSearchCV / RandomizedSearchCV ✅ Translating model performance → ROI, lift, and profit curves ✅ Choosing the best campaign target group based on predicted response probability
🗂️ Dataset Overview#
Use the dataset: marketing_campaign.csv
(available in /data folder or downloadable from Colab link below)
Feature |
Description |
|---|---|
|
Customer age |
|
Annual income ($) |
|
Prior engagement with campaigns |
|
Campaign channel (email, social, etc.) |
|
1 if responded positively, else 0 |
🧰 Setup & Notebook Access#
You can run this lab directly in:
Option |
Link |
|---|---|
🧮 JupyterLite (in-browser) |
Run in JupyterLite ▶️ |
☁️ Google Colab |
|
💾 Download Notebook |
Download |
🧠 Step-by-Step Instructions#
1. Load & Explore Data#
import pandas as pd
df = pd.read_csv("marketing_campaign.csv")
df.head()
Check data types, missing values, and basic stats. Don’t forget to ask the existential question:
“Why is income missing for half our audience?” 💸
2. Preprocess#
Encode categorical features (
channel)Handle missing income with median imputation
Scale continuous variables
Bonus Challenge: Try both
StandardScalerandMinMaxScaler— and see if your model’s mood improves. 😆
3. Split Data & Define Models#
Use a Stratified Train-Test Split to preserve response balance.
Try multiple models:
Logistic Regression
Random Forest
XGBoost (optional but fun 💥)
4. Evaluate with Cross-Validation#
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(random_state=42)
scores = cross_val_score(rf, X, y, cv=5, scoring="f1")
print("Average F1:", scores.mean())
💡 Tip: Keep a leaderboard of all models — treat it like a data science version of The Bachelor 💔🌹
5. Hyperparameter Tuning#
Use GridSearchCV or RandomizedSearchCV to find the sweet spot.
Model |
Key Hyperparameters |
|---|---|
Logistic Regression |
|
Random Forest |
|
XGBoost |
|
Don’t overfit — the goal is ROI, not ego points. 😎
6. Business-Aware Evaluation#
Compute ROI, lift, and expected profit for each model.
Example profit function:
def campaign_profit(y_true, y_pred_proba, threshold=0.5):
TP_gain, FP_cost, FN_cost = 100, 10, 50
preds = (y_pred_proba >= threshold).astype(int)
TP = ((preds == 1) & (y_true == 1)).sum()
FP = ((preds == 1) & (y_true == 0)).sum()
FN = ((preds == 0) & (y_true == 1)).sum()
return TP*TP_gain - FP*FP_cost - FN*FN_cost
Try different thresholds to find the profit-maximizing one, not just where F1 is max. 💰
7. Visualize Profit Curve#
import numpy as np
import matplotlib.pyplot as plt
thresholds = np.linspace(0, 1, 100)
profits = [campaign_profit(y_test, y_pred_proba, t) for t in thresholds]
plt.plot(thresholds, profits)
plt.xlabel("Threshold")
plt.ylabel("Expected Profit ($)")
plt.title("Optimize for 💵, not just F1")
plt.show()
8. Interpret & Present#
Explain results to your “boss” (or pretend one 🧑💼):
What’s the best model and why?
What threshold gives the best ROI?
Which features drive campaign response?
Pro tip: Use visuals — executives fear math but love graphs. 📊❤️
🧪 Stretch Goal#
Deploy a threshold-based segmenter:
Top 10% predicted responders = Premium segment
Next 30% = Target later
Rest = Send memes instead of ads 😜
🎓 Deliverables#
Notebook with:
Model training + tuning
ROI / lift plots
Final model comparison table
Short business summary:
“This model improves campaign ROI by 18% with 40% fewer promotions sent.”
💬 Final Thought#
“In marketing, knowing who not to target is often worth more than finding who to target.” 💡
# Your code here