Lab – Campaign Targeting#

“Because sometimes, predicting who clicks an ad is harder than convincing your boss why you need more GPUs.” 😅


🎯 Objective#

You’re the new data scientist at AdAstra Marketing Co. 🚀 The marketing team wants to predict which customers are most likely to respond to a campaign, so they can stop wasting money on people who think “unsubscribe” means “tell me more.”

Your job:

Build, tune, and evaluate multiple ML models — then choose the one that makes the most business sense, not just the highest accuracy.


🧩 What You’ll Practice#

✅ Model comparison with Cross-Validation and Nested CV ✅ Hyperparameter tuning using GridSearchCV / RandomizedSearchCV ✅ Translating model performance → ROI, lift, and profit curves ✅ Choosing the best campaign target group based on predicted response probability


🗂️ Dataset Overview#

Use the dataset: marketing_campaign.csv (available in /data folder or downloadable from Colab link below)

Feature

Description

age

Customer age

income

Annual income ($)

spending_score

Prior engagement with campaigns

channel

Campaign channel (email, social, etc.)

response

1 if responded positively, else 0


🧰 Setup & Notebook Access#

You can run this lab directly in:

Option

Link

🧮 JupyterLite (in-browser)

Run in JupyterLite ▶️

☁️ Google Colab

Open in Colab 🚀

💾 Download Notebook

Download .ipynb


🧠 Step-by-Step Instructions#

1. Load & Explore Data#

import pandas as pd

df = pd.read_csv("marketing_campaign.csv")
df.head()

Check data types, missing values, and basic stats. Don’t forget to ask the existential question:

“Why is income missing for half our audience?” 💸


2. Preprocess#

  • Encode categorical features (channel)

  • Handle missing income with median imputation

  • Scale continuous variables

Bonus Challenge: Try both StandardScaler and MinMaxScaler — and see if your model’s mood improves. 😆


3. Split Data & Define Models#

Use a Stratified Train-Test Split to preserve response balance.

Try multiple models:

  • Logistic Regression

  • Random Forest

  • XGBoost (optional but fun 💥)


4. Evaluate with Cross-Validation#

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(random_state=42)
scores = cross_val_score(rf, X, y, cv=5, scoring="f1")
print("Average F1:", scores.mean())

💡 Tip: Keep a leaderboard of all models — treat it like a data science version of The Bachelor 💔🌹


5. Hyperparameter Tuning#

Use GridSearchCV or RandomizedSearchCV to find the sweet spot.

Model

Key Hyperparameters

Logistic Regression

C, penalty

Random Forest

n_estimators, max_depth, min_samples_split

XGBoost

eta, max_depth, subsample

Don’t overfit — the goal is ROI, not ego points. 😎


6. Business-Aware Evaluation#

Compute ROI, lift, and expected profit for each model.

Example profit function:

def campaign_profit(y_true, y_pred_proba, threshold=0.5):
    TP_gain, FP_cost, FN_cost = 100, 10, 50
    preds = (y_pred_proba >= threshold).astype(int)
    TP = ((preds == 1) & (y_true == 1)).sum()
    FP = ((preds == 1) & (y_true == 0)).sum()
    FN = ((preds == 0) & (y_true == 1)).sum()
    return TP*TP_gain - FP*FP_cost - FN*FN_cost

Try different thresholds to find the profit-maximizing one, not just where F1 is max. 💰


7. Visualize Profit Curve#

import numpy as np
import matplotlib.pyplot as plt

thresholds = np.linspace(0, 1, 100)
profits = [campaign_profit(y_test, y_pred_proba, t) for t in thresholds]

plt.plot(thresholds, profits)
plt.xlabel("Threshold")
plt.ylabel("Expected Profit ($)")
plt.title("Optimize for 💵, not just F1")
plt.show()

8. Interpret & Present#

Explain results to your “boss” (or pretend one 🧑‍💼):

  • What’s the best model and why?

  • What threshold gives the best ROI?

  • Which features drive campaign response?

Pro tip: Use visuals — executives fear math but love graphs. 📊❤️


🧪 Stretch Goal#

Deploy a threshold-based segmenter:

  • Top 10% predicted responders = Premium segment

  • Next 30% = Target later

  • Rest = Send memes instead of ads 😜


🎓 Deliverables#

  • Notebook with:

    • Model training + tuning

    • ROI / lift plots

    • Final model comparison table

  • Short business summary:

    “This model improves campaign ROI by 18% with 40% fewer promotions sent.”


💬 Final Thought#

“In marketing, knowing who not to target is often worth more than finding who to target.” 💡


# Your code here