Lab – Customer Segmentation

Lab – Customer Segmentation#

Welcome to the Customer Segmentation Lab — where we use KNN to group similar customers faster than a marketing intern sorting spreadsheets on caffeine. ☕💻

In this lab, you’ll:

Use distance-based similarity to find customer groups
Compare different K values
Visualize segment boundaries
Interpret business insights

Let’s turn data into marketing magic. 🪄📈

🧰 Setup#

You can run this notebook directly in:

🧠 JupyterLite (Run above)
🧩 Google Colab
💾 Or Download the notebook

🪄 Step 1: Load the Data#

Let’s create a mock dataset of customers based on:

Annual Income
Spending Score

Yes, it’s inspired by the famous “Mall Customers” dataset — because malls and marketing never go out of style. 🛍️

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split

# Mock customer data
np.random.seed(42)
n = 200
income = np.random.normal(60, 20, n)       # Annual income (k$)
spend_score = np.random.normal(50, 25, n)  # Spending score (0–100)
segment = np.where(income > 60, (spend_score > 50).astype(int), 0)

data = pd.DataFrame({
    'Income_k$': income,
    'SpendingScore': spend_score,
    'Segment': segment
})

data.head()

🧼 Step 2: Clean & Scale Data#

Scaling matters here — otherwise “Income” might bully “SpendingScore” in distance calculations. 💰💪

X = data[['Income_k$', 'SpendingScore']]
y = data['Segment']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)

🧮 Step 3: Train KNN#

Let’s start simple — K=5. Our KNN model will look for 5 closest customers for each new one.

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

print("Training accuracy:", round(knn.score(X_train, y_train), 3))
print("Test accuracy:", round(knn.score(X_test, y_test), 3))

🎨 Step 4: Visualize the Segmentation#

Let’s see how KNN “draws” its decision boundaries — like a business strategist armed with crayons.

import numpy as np

# Grid for visualization
x_min, x_max = X_scaled[:, 0].min() - 1, X_scaled[:, 0].max() + 1
y_min, y_max = X_scaled[:, 1].min() - 1, X_scaled[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))

Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.figure(figsize=(8,6))
plt.contourf(xx, yy, Z, cmap='coolwarm', alpha=0.3)
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=y, cmap='coolwarm', edgecolor='k')
plt.title("Customer Segmentation with KNN")
plt.xlabel("Income (scaled)")
plt.ylabel("Spending Score (scaled)")
plt.show()

💡 You should see regions showing different segments of customers — our KNN just made a segmentation strategy based on who spends how much. 💳✨

🧪 Step 5: Tuning K (aka “How many friends to trust?”)#

The number of neighbors K controls how smooth or chaotic your decision boundary becomes.

Let’s test different K values.

from sklearn.metrics import accuracy_score

scores = []
for k in range(1, 21):
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)
    scores.append(knn.score(X_test, y_test))

plt.plot(range(1, 21), scores, marker='o')
plt.title("K vs Accuracy")
plt.xlabel("Number of Neighbors (K)")
plt.ylabel("Accuracy")
plt.grid(True)
plt.show()

🧠 Try interpreting:

Low K: Very reactive, may overfit (believes the nearest gossip).
High K: Too smooth, may underfit (trusts everyone too much).

Choose your K wisely — business strategy meets social dynamics.

💼 Step 6: Business Interpretation#

Now, how does this matter in real life?

Segment	Description	Example Business Use
0	Low income / low spenders	Offer discount coupons or loyalty rewards
1	High income / high spenders	Upsell luxury products or premium memberships
2+	Other combinations	Personalized cross-sells

✨ You just did customer segmentation using distance-based reasoning — the heart of recommender systems, marketing analytics, and churn prediction!

🧩 TL;DR Summary#

Step	Concept	Business Angle
Data Prep	Scaling & splitting	Make sure income ≠ everything
Model	KNN	Lazy learner, smart pattern finder
K Tuning	Finding best K	Trust right number of neighbors
Visualization	Boundaries	Understand customer clusters
Insight	Segments	Strategy-ready grouping

“KNN doesn’t predict with equations — it predicts with empathy.” 💬 Similar people, similar outcomes.

⏭️ Next Chapter: Unsupervised Learning – Clustering & Dimensionality Reduction We’ll stop asking for labels altogether — and let the data find its own tribe. 🧭

# Your code here