Lab – Customer Segmentation - Machine Learning for Business

Welcome to the Marketing Department’s favorite use of machine learning: turning a boring customer spreadsheet into fancy “segments” so everyone can nod wisely in meetings. 😎

🎯 Goal¶

Use Unsupervised Learning (PCA + K-Means + t-SNE/UMAP) to segment customers based on their behavior.

By the end of this lab, you’ll:

Identify meaningful customer clusters 🧍‍♀️🧍‍♂️🧍‍♀️
Visualize them beautifully 🎨
Give them cool names like “Budget Shoppers” and “Luxury Lovers” 💸

🧩 Step 1: Load and Explore the Data¶

Let’s start with some fictional customer data — think of an online store with spending patterns.

import pandas as pd

df = pd.read_csv("customers.csv")
df.head()

CustomerID	Age	Income	SpendingScore	LoyaltyYears	OnlinePurchases
1001	25	45_000	72	1	12
1002	42	85_000	35	5	6

Now, take a peek at some stats 👀

df.describe()

⚙️ Step 2: Prepare and Scale¶

Distance-based algorithms hate unscaled data — treat all features equally or your model will think “Income” is the only thing that matters.”

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(df.drop(columns=["CustomerID"]))

🧠 Step 3: Dimensionality Reduction with PCA¶

Even marketers don’t like 6D scatterplots. Let’s compress it down while keeping the main variance.

from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

Check out how much info we kept:

pca.explained_variance_ratio_.sum()

Usually around 70–90% = good enough for storytelling 🎬

💡 Step 4: Cluster with K-Means¶

Now, the star of the show — K-Means! 💥 (aka “let’s pretend we know how many clusters exist.”)

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=4, random_state=42)
df["Cluster"] = kmeans.fit_predict(X_scaled)

Boom — customers grouped by mysterious mathematical forces. ✨

🧭 Step 5: Visualize with t-SNE or UMAP¶

Because management loves visuals.

from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

tsne = TSNE(n_components=2, random_state=42, perplexity=30)
X_vis = tsne.fit_transform(X_scaled)

plt.figure(figsize=(8,6))
plt.scatter(X_vis[:,0], X_vis[:,1], c=df["Cluster"], cmap='tab10')
plt.title("t-SNE Visualization – Customer Segments 🎨")
plt.show()

Each color = a different segment. Try UMAP for faster, calmer results. 🧘‍♀️

🕵️ Step 6: Interpret the Clusters¶

Now for the marketing translation step — a.k.a. turning math into personas 😅

df.groupby("Cluster").mean()

Cluster	Age	Income	SpendingScore	LoyaltyYears	OnlinePurchases
0	28	40k	80	1	10
1	45	90k	30	5	3
2	35	60k	60	2	8

Possible names:

🧑‍💻 Young Spenders – low loyalty, high impulse
👨‍👩‍👧 Family Budgeters – steady income, average spending
💎 Luxury Loyalists – high income, low churn risk

💬 Step 7: Business Insight Time¶

Now the fun part — what do we do with these segments?

Segment	Strategy
Young Spenders	Flash sales & online ads
Family Budgeters	Loyalty programs
Luxury Loyalists	Premium tier or early access offers

This is where machine learning becomes money learning. 💰📈

🔁 Optional: Automate the Pipeline¶

from sklearn.pipeline import Pipeline

pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("pca", PCA(n_components=2)),
    ("kmeans", KMeans(n_clusters=4, random_state=42))
])

pipeline.fit(df.drop(columns=["CustomerID"]))

Now you’ve got a reusable segmentation pipeline — ready to plug into dashboards or marketing campaigns!

🧍 Recap¶

Step	What You Did	Why It Matters
1	Load & Scale Data	Prep for ML
2	PCA	Reduce dimensions
3	K-Means	Find hidden groups
4	t-SNE/UMAP	Visualize beautifully
5	Interpret & Act	Turn insight into strategy

🏁 Wrap-Up¶

You just:

Found hidden customer patterns 🧠
Gave them business meaning 💼
Created visuals that your CMO will love 🎨

Next time someone asks,

“Can we use AI to segment our customers?”

You can say confidently —

“Already done. And it looks gorgeous.” 😎

# Your code here