Lab – Customer Segmentation#
Welcome to the Marketing Department’s favorite use of machine learning: turning a boring customer spreadsheet into fancy “segments” so everyone can nod wisely in meetings. 😎
🎯 Goal#
Use Unsupervised Learning (PCA + K-Means + t-SNE/UMAP) to segment customers based on their behavior.
By the end of this lab, you’ll:
Identify meaningful customer clusters 🧍♀️🧍♂️🧍♀️
Visualize them beautifully 🎨
Give them cool names like “Budget Shoppers” and “Luxury Lovers” 💸
🧩 Step 1: Load and Explore the Data#
Let’s start with some fictional customer data — think of an online store with spending patterns.
import pandas as pd
df = pd.read_csv("customers.csv")
df.head()
CustomerID |
Age |
Income |
SpendingScore |
LoyaltyYears |
OnlinePurchases |
|---|---|---|---|---|---|
1001 |
25 |
45_000 |
72 |
1 |
12 |
1002 |
42 |
85_000 |
35 |
5 |
6 |
Now, take a peek at some stats 👀
df.describe()
⚙️ Step 2: Prepare and Scale#
Distance-based algorithms hate unscaled data — treat all features equally or your model will think “Income” is the only thing that matters.”
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df.drop(columns=["CustomerID"]))
🧠 Step 3: Dimensionality Reduction with PCA#
Even marketers don’t like 6D scatterplots. Let’s compress it down while keeping the main variance.
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
Check out how much info we kept:
pca.explained_variance_ratio_.sum()
Usually around 70–90% = good enough for storytelling 🎬
💡 Step 4: Cluster with K-Means#
Now, the star of the show — K-Means! 💥 (aka “let’s pretend we know how many clusters exist.”)
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=4, random_state=42)
df["Cluster"] = kmeans.fit_predict(X_scaled)
Boom — customers grouped by mysterious mathematical forces. ✨
🧭 Step 5: Visualize with t-SNE or UMAP#
Because management loves visuals.
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
tsne = TSNE(n_components=2, random_state=42, perplexity=30)
X_vis = tsne.fit_transform(X_scaled)
plt.figure(figsize=(8,6))
plt.scatter(X_vis[:,0], X_vis[:,1], c=df["Cluster"], cmap='tab10')
plt.title("t-SNE Visualization – Customer Segments 🎨")
plt.show()
Each color = a different segment. Try UMAP for faster, calmer results. 🧘♀️
🕵️ Step 6: Interpret the Clusters#
Now for the marketing translation step — a.k.a. turning math into personas 😅
df.groupby("Cluster").mean()
Cluster |
Age |
Income |
SpendingScore |
LoyaltyYears |
OnlinePurchases |
|---|---|---|---|---|---|
0 |
28 |
40k |
80 |
1 |
10 |
1 |
45 |
90k |
30 |
5 |
3 |
2 |
35 |
60k |
60 |
2 |
8 |
Possible names:
🧑💻 Young Spenders – low loyalty, high impulse
👨👩👧 Family Budgeters – steady income, average spending
💎 Luxury Loyalists – high income, low churn risk
💬 Step 7: Business Insight Time#
Now the fun part — what do we do with these segments?
Segment |
Strategy |
|---|---|
Young Spenders |
Flash sales & online ads |
Family Budgeters |
Loyalty programs |
Luxury Loyalists |
Premium tier or early access offers |
This is where machine learning becomes money learning. 💰📈
🔁 Optional: Automate the Pipeline#
from sklearn.pipeline import Pipeline
pipeline = Pipeline([
("scaler", StandardScaler()),
("pca", PCA(n_components=2)),
("kmeans", KMeans(n_clusters=4, random_state=42))
])
pipeline.fit(df.drop(columns=["CustomerID"]))
Now you’ve got a reusable segmentation pipeline — ready to plug into dashboards or marketing campaigns!
🧍 Recap#
Step |
What You Did |
Why It Matters |
|---|---|---|
1 |
Load & Scale Data |
Prep for ML |
2 |
PCA |
Reduce dimensions |
3 |
K-Means |
Find hidden groups |
4 |
t-SNE/UMAP |
Visualize beautifully |
5 |
Interpret & Act |
Turn insight into strategy |
🏁 Wrap-Up#
You just:
Found hidden customer patterns 🧠
Gave them business meaning 💼
Created visuals that your CMO will love 🎨
Next time someone asks,
“Can we use AI to segment our customers?”
You can say confidently —
“Already done. And it looks gorgeous.” 😎
# Your code here