Lab – Customer Segmentation#

Welcome to the Marketing Department’s favorite use of machine learning: turning a boring customer spreadsheet into fancy “segments” so everyone can nod wisely in meetings. 😎


🎯 Goal#

Use Unsupervised Learning (PCA + K-Means + t-SNE/UMAP) to segment customers based on their behavior.

By the end of this lab, you’ll:

  • Identify meaningful customer clusters 🧍‍♀️🧍‍♂️🧍‍♀️

  • Visualize them beautifully 🎨

  • Give them cool names like “Budget Shoppers” and “Luxury Lovers” 💸


🧩 Step 1: Load and Explore the Data#

Let’s start with some fictional customer data — think of an online store with spending patterns.

import pandas as pd

df = pd.read_csv("customers.csv")
df.head()

CustomerID

Age

Income

SpendingScore

LoyaltyYears

OnlinePurchases

1001

25

45_000

72

1

12

1002

42

85_000

35

5

6

Now, take a peek at some stats 👀

df.describe()

⚙️ Step 2: Prepare and Scale#

Distance-based algorithms hate unscaled data — treat all features equally or your model will think “Income” is the only thing that matters.”

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(df.drop(columns=["CustomerID"]))

🧠 Step 3: Dimensionality Reduction with PCA#

Even marketers don’t like 6D scatterplots. Let’s compress it down while keeping the main variance.

from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

Check out how much info we kept:

pca.explained_variance_ratio_.sum()

Usually around 70–90% = good enough for storytelling 🎬


💡 Step 4: Cluster with K-Means#

Now, the star of the show — K-Means! 💥 (aka “let’s pretend we know how many clusters exist.”)

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=4, random_state=42)
df["Cluster"] = kmeans.fit_predict(X_scaled)

Boom — customers grouped by mysterious mathematical forces. ✨


🧭 Step 5: Visualize with t-SNE or UMAP#

Because management loves visuals.

from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

tsne = TSNE(n_components=2, random_state=42, perplexity=30)
X_vis = tsne.fit_transform(X_scaled)

plt.figure(figsize=(8,6))
plt.scatter(X_vis[:,0], X_vis[:,1], c=df["Cluster"], cmap='tab10')
plt.title("t-SNE Visualization – Customer Segments 🎨")
plt.show()

Each color = a different segment. Try UMAP for faster, calmer results. 🧘‍♀️


🕵️ Step 6: Interpret the Clusters#

Now for the marketing translation step — a.k.a. turning math into personas 😅

df.groupby("Cluster").mean()

Cluster

Age

Income

SpendingScore

LoyaltyYears

OnlinePurchases

0

28

40k

80

1

10

1

45

90k

30

5

3

2

35

60k

60

2

8

Possible names:

  • 🧑‍💻 Young Spenders – low loyalty, high impulse

  • 👨‍👩‍👧 Family Budgeters – steady income, average spending

  • 💎 Luxury Loyalists – high income, low churn risk


💬 Step 7: Business Insight Time#

Now the fun part — what do we do with these segments?

Segment

Strategy

Young Spenders

Flash sales & online ads

Family Budgeters

Loyalty programs

Luxury Loyalists

Premium tier or early access offers

This is where machine learning becomes money learning. 💰📈


🔁 Optional: Automate the Pipeline#

from sklearn.pipeline import Pipeline

pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("pca", PCA(n_components=2)),
    ("kmeans", KMeans(n_clusters=4, random_state=42))
])

pipeline.fit(df.drop(columns=["CustomerID"]))

Now you’ve got a reusable segmentation pipeline — ready to plug into dashboards or marketing campaigns!


🧍 Recap#

Step

What You Did

Why It Matters

1

Load & Scale Data

Prep for ML

2

PCA

Reduce dimensions

3

K-Means

Find hidden groups

4

t-SNE/UMAP

Visualize beautifully

5

Interpret & Act

Turn insight into strategy


🏁 Wrap-Up#

You just:

  • Found hidden customer patterns 🧠

  • Gave them business meaning 💼

  • Created visuals that your CMO will love 🎨

Next time someone asks,

“Can we use AI to segment our customers?”

You can say confidently —

“Already done. And it looks gorgeous.” 😎

# Your code here