Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Welcome to your final showdown in this classification chapter:

Predicting which customers are about to ghost your business 👻

You’ll be the data-driven fortune teller of customer loyalty — except instead of a crystal ball, you’ve got pandas, sklearn, and a strong cup of coffee ☕.


🎯 Objective

You’re working for SuperTel, a telecom company. Management suspects customers are silently slipping away to competitors. Your mission:

  • Predict who’s likely to churn

  • Identify key drivers

  • Save the company a fortune in customer retention 🍾


📦 Step 1: Load the Data

Let’s start by importing a mock dataset — or you can load your own!

import pandas as pd

# Sample dataset (feel free to use your own)
url = "https://raw.githubusercontent.com/chandraveshchaudhari/datasets/main/telecom_churn_sample.csv"
data = pd.read_csv(url)

data.head()

Typical columns might look like:

  • tenure → months as a customer

  • monthly_charges → how much they pay

  • contract_type → “Month-to-Month”, “Yearly”, etc.

  • churn → 1 if they left, 0 if they stayed


🧹 Step 2: Clean & Prepare Data

Let’s tidy things up a bit — because no model deserves messy data. 😅

data = data.dropna()
data['contract_type'] = data['contract_type'].astype('category').cat.codes

Split your data:

from sklearn.model_selection import train_test_split

X = data.drop('churn', axis=1)
y = data['churn']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

⚙️ Step 3: Train Your Classifiers

Let’s train two contenders:

  1. Logistic Regression – the confident statistician

  2. Naive Bayes – the probabilistic wizard 🧙‍♂️

from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB

log_model = LogisticRegression(max_iter=1000)
nb_model = GaussianNB()

log_model.fit(X_train, y_train)
nb_model.fit(X_train, y_train)

📈 Step 4: Evaluate the Models

We’ll use precision, recall, F1, and accuracy — because “95% accuracy” alone means nothing when all your churners are ignored. 😅

from sklearn.metrics import classification_report

print("=== Logistic Regression ===")
print(classification_report(y_test, log_model.predict(X_test)))

print("=== Naive Bayes ===")
print(classification_report(y_test, nb_model.predict(X_test)))

Which one wins? Logistic Regression is often more calibrated. Naive Bayes might overpredict churn — but that’s still better than missing one.


📊 Step 5: Predict Probabilities (and Get Business Insights)

Let’s peek under the hood and interpret the probabilities.

probs = pd.DataFrame({
    'Actual': y_test,
    'Predicted_Prob': log_model.predict_proba(X_test)[:, 1]
})

probs.head()

Want to feel fancy? Plot a calibration curve:

from sklearn.calibration import calibration_curve
import matplotlib.pyplot as plt

true_prob, pred_prob = calibration_curve(y_test, probs['Predicted_Prob'], n_bins=10)
plt.plot(pred_prob, true_prob, marker='o')
plt.plot([0,1],[0,1],'--')
plt.title("Model Calibration – Confidence vs Reality")
plt.xlabel("Predicted Probability")
plt.ylabel("True Probability")
plt.show()

A well-calibrated model = a humble model that knows what it doesn’t know. 🤓


💼 Step 6: Business Interpretation

Once you’ve got a model you trust, share insights like:

FeatureEffectAction
Low tenureHigher churnCreate loyalty discounts 💳
High monthly chargeHigher churnOffer flexible plans 📉
Annual contractLower churnPromote long-term deals 📅

Remember: ML ≠ Magic. The real power comes from turning these insights into business strategy.


🧩 Challenge Extensions

Try one (or all):

  1. Add new features — like “number of complaints” or “payment method.”

  2. Compare models with and without class_weight='balanced'.

  3. Use SMOTE to handle class imbalance.

  4. Build a simple dashboard with churn probabilities per customer.

Feeling ambitious? Deploy your model as a churn alert system — Ping the sales team when a customer looks 80% likely to quit. 🚨


🎓 TL;DR

StepWhat You Did
Load & clean dataGot the dataset ready
Train modelsLogistic & Naive Bayes
EvaluatePrecision, recall, F1
CalibrateChecked probability accuracy
InterpretDerived business insights

💬 “Remember: predicting churn isn’t about math — it’s about keeping humans happy.” ❤️📊


🔗 Next Chapter: Optimization & Training Practicalities Because your model’s training process could use some optimization too — unlike your caffeine intake. ☕⚙️

# Your code here