Lab – Campaign Targeting - Machine Learning for Business

“Because sometimes, predicting who clicks an ad is harder than convincing your boss why you need more GPUs.” 😅

🎯 Objective¶

You’re the new data scientist at AdAstra Marketing Co. 🚀 The marketing team wants to predict which customers are most likely to respond to a campaign, so they can stop wasting money on people who think “unsubscribe” means “tell me more.”

Your job:

Build, tune, and evaluate multiple ML models — then choose the one that makes the most business sense, not just the highest accuracy.

🧩 What You’ll Practice¶

✅ Model comparison with Cross-Validation and Nested CV ✅ Hyperparameter tuning using GridSearchCV / RandomizedSearchCV ✅ Translating model performance → ROI, lift, and profit curves ✅ Choosing the best campaign target group based on predicted response probability

🗂️ Dataset Overview¶

Use the dataset: marketing_campaign.csv (available in /data folder or downloadable from Colab link below)

Feature	Description
`age`	Customer age
`income`	Annual income ($)
`spending_score`	Prior engagement with campaigns
`channel`	Campaign channel (email, social, etc.)
`response`	1 if responded positively, else 0

🧰 Setup & Notebook Access¶

You can run this lab directly in:

Option	Link
🧮 JupyterLite (in-browser)	Run in JupyterLite ▶️
☁️ Google Colab	Open in Colab 🚀
💾 Download Notebook	Download `.ipynb`

🧠 Step-by-Step Instructions¶

1. Load & Explore Data¶

import pandas as pd

df = pd.read_csv("marketing_campaign.csv")
df.head()

Check data types, missing values, and basic stats. Don’t forget to ask the existential question:

“Why is income missing for half our audience?” 💸

2. Preprocess¶

Encode categorical features (channel)
Handle missing income with median imputation
Scale continuous variables

Bonus Challenge: Try both StandardScaler and MinMaxScaler — and see if your model’s mood improves. 😆

3. Split Data & Define Models¶

Use a Stratified Train-Test Split to preserve response balance.

Try multiple models:

Logistic Regression
Random Forest
XGBoost (optional but fun 💥)

4. Evaluate with Cross-Validation¶

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(random_state=42)
scores = cross_val_score(rf, X, y, cv=5, scoring="f1")
print("Average F1:", scores.mean())

💡 Tip: Keep a leaderboard of all models — treat it like a data science version of The Bachelor 💔🌹

5. Hyperparameter Tuning¶

Use GridSearchCV or RandomizedSearchCV to find the sweet spot.

Model	Key Hyperparameters
Logistic Regression	`C`, `penalty`
Random Forest	`n_estimators`, `max_depth`, `min_samples_split`
XGBoost	`eta`, `max_depth`, `subsample`

Don’t overfit — the goal is ROI, not ego points. 😎

6. Business-Aware Evaluation¶

Compute ROI, lift, and expected profit for each model.

Example profit function:

def campaign_profit(y_true, y_pred_proba, threshold=0.5):
    TP_gain, FP_cost, FN_cost = 100, 10, 50
    preds = (y_pred_proba >= threshold).astype(int)
    TP = ((preds == 1) & (y_true == 1)).sum()
    FP = ((preds == 1) & (y_true == 0)).sum()
    FN = ((preds == 0) & (y_true == 1)).sum()
    return TP*TP_gain - FP*FP_cost - FN*FN_cost

Try different thresholds to find the profit-maximizing one, not just where F1 is max. 💰

7. Visualize Profit Curve¶

import numpy as np
import matplotlib.pyplot as plt

thresholds = np.linspace(0, 1, 100)
profits = [campaign_profit(y_test, y_pred_proba, t) for t in thresholds]

plt.plot(thresholds, profits)
plt.xlabel("Threshold")
plt.ylabel("Expected Profit ($)")
plt.title("Optimize for 💵, not just F1")
plt.show()

8. Interpret & Present¶

Explain results to your “boss” (or pretend one 🧑‍💼):

What’s the best model and why?
What threshold gives the best ROI?
Which features drive campaign response?

Pro tip: Use visuals — executives fear math but love graphs. 📊❤️

🧪 Stretch Goal¶

Deploy a threshold-based segmenter:

Top 10% predicted responders = Premium segment
Next 30% = Target later
Rest = Send memes instead of ads 😜

🎓 Deliverables¶

Notebook with:
- Model training + tuning
- ROI / lift plots
- Final model comparison table
Short business summary:
“This model improves campaign ROI by 18% with 40% fewer promotions sent.”

💬 Final Thought¶

“In marketing, knowing who not to target is often worth more than finding who to target.” 💡

# Your code here