Lab – Customer Segmentation#
Welcome to the Customer Segmentation Lab — where we use KNN to group similar customers faster than a marketing intern sorting spreadsheets on caffeine. ☕💻
In this lab, you’ll:
Use distance-based similarity to find customer groups
Compare different K values
Visualize segment boundaries
Interpret business insights
Let’s turn data into marketing magic. 🪄📈
🧰 Setup#
You can run this notebook directly in:
🧠 JupyterLite (Run above)
🧩 Google Colab
🪄 Step 1: Load the Data#
Let’s create a mock dataset of customers based on:
Annual Income
Spending Score
Yes, it’s inspired by the famous “Mall Customers” dataset — because malls and marketing never go out of style. 🛍️
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
# Mock customer data
np.random.seed(42)
n = 200
income = np.random.normal(60, 20, n) # Annual income (k$)
spend_score = np.random.normal(50, 25, n) # Spending score (0–100)
segment = np.where(income > 60, (spend_score > 50).astype(int), 0)
data = pd.DataFrame({
'Income_k$': income,
'SpendingScore': spend_score,
'Segment': segment
})
data.head()
🧼 Step 2: Clean & Scale Data#
Scaling matters here — otherwise “Income” might bully “SpendingScore” in distance calculations. 💰💪
X = data[['Income_k$', 'SpendingScore']]
y = data['Segment']
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)
🧮 Step 3: Train KNN#
Let’s start simple — K=5. Our KNN model will look for 5 closest customers for each new one.
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
print("Training accuracy:", round(knn.score(X_train, y_train), 3))
print("Test accuracy:", round(knn.score(X_test, y_test), 3))
🎨 Step 4: Visualize the Segmentation#
Let’s see how KNN “draws” its decision boundaries — like a business strategist armed with crayons.
import numpy as np
# Grid for visualization
x_min, x_max = X_scaled[:, 0].min() - 1, X_scaled[:, 0].max() + 1
y_min, y_max = X_scaled[:, 1].min() - 1, X_scaled[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
np.arange(y_min, y_max, 0.02))
Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(figsize=(8,6))
plt.contourf(xx, yy, Z, cmap='coolwarm', alpha=0.3)
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=y, cmap='coolwarm', edgecolor='k')
plt.title("Customer Segmentation with KNN")
plt.xlabel("Income (scaled)")
plt.ylabel("Spending Score (scaled)")
plt.show()
💡 You should see regions showing different segments of customers — our KNN just made a segmentation strategy based on who spends how much. 💳✨
🧪 Step 5: Tuning K (aka “How many friends to trust?”)#
The number of neighbors K controls how smooth or chaotic your decision boundary becomes.
Let’s test different K values.
from sklearn.metrics import accuracy_score
scores = []
for k in range(1, 21):
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
scores.append(knn.score(X_test, y_test))
plt.plot(range(1, 21), scores, marker='o')
plt.title("K vs Accuracy")
plt.xlabel("Number of Neighbors (K)")
plt.ylabel("Accuracy")
plt.grid(True)
plt.show()
🧠 Try interpreting:
Low K: Very reactive, may overfit (believes the nearest gossip).
High K: Too smooth, may underfit (trusts everyone too much).
Choose your K wisely — business strategy meets social dynamics.
💼 Step 6: Business Interpretation#
Now, how does this matter in real life?
Segment |
Description |
Example Business Use |
|---|---|---|
0 |
Low income / low spenders |
Offer discount coupons or loyalty rewards |
1 |
High income / high spenders |
Upsell luxury products or premium memberships |
2+ |
Other combinations |
Personalized cross-sells |
✨ You just did customer segmentation using distance-based reasoning — the heart of recommender systems, marketing analytics, and churn prediction!
🧩 TL;DR Summary#
Step |
Concept |
Business Angle |
|---|---|---|
Data Prep |
Scaling & splitting |
Make sure income ≠ everything |
Model |
KNN |
Lazy learner, smart pattern finder |
K Tuning |
Finding best K |
Trust right number of neighbors |
Visualization |
Boundaries |
Understand customer clusters |
Insight |
Segments |
Strategy-ready grouping |
“KNN doesn’t predict with equations — it predicts with empathy.” 💬 Similar people, similar outcomes.
⏭️ Next Chapter: Unsupervised Learning – Clustering & Dimensionality Reduction We’ll stop asking for labels altogether — and let the data find its own tribe. 🧭
# Your code here