Lab – Customer Segmentation#

Welcome to the Customer Segmentation Lab — where we use KNN to group similar customers faster than a marketing intern sorting spreadsheets on caffeine. ☕💻

In this lab, you’ll:

  • Use distance-based similarity to find customer groups

  • Compare different K values

  • Visualize segment boundaries

  • Interpret business insights

Let’s turn data into marketing magic. 🪄📈


🧰 Setup#

You can run this notebook directly in:


🪄 Step 1: Load the Data#

Let’s create a mock dataset of customers based on:

  • Annual Income

  • Spending Score

Yes, it’s inspired by the famous “Mall Customers” dataset — because malls and marketing never go out of style. 🛍️

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split

# Mock customer data
np.random.seed(42)
n = 200
income = np.random.normal(60, 20, n)       # Annual income (k$)
spend_score = np.random.normal(50, 25, n)  # Spending score (0–100)
segment = np.where(income > 60, (spend_score > 50).astype(int), 0)

data = pd.DataFrame({
    'Income_k$': income,
    'SpendingScore': spend_score,
    'Segment': segment
})

data.head()

🧼 Step 2: Clean & Scale Data#

Scaling matters here — otherwise “Income” might bully “SpendingScore” in distance calculations. 💰💪

X = data[['Income_k$', 'SpendingScore']]
y = data['Segment']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)

🧮 Step 3: Train KNN#

Let’s start simple — K=5. Our KNN model will look for 5 closest customers for each new one.

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

print("Training accuracy:", round(knn.score(X_train, y_train), 3))
print("Test accuracy:", round(knn.score(X_test, y_test), 3))

🎨 Step 4: Visualize the Segmentation#

Let’s see how KNN “draws” its decision boundaries — like a business strategist armed with crayons.

import numpy as np

# Grid for visualization
x_min, x_max = X_scaled[:, 0].min() - 1, X_scaled[:, 0].max() + 1
y_min, y_max = X_scaled[:, 1].min() - 1, X_scaled[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))

Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.figure(figsize=(8,6))
plt.contourf(xx, yy, Z, cmap='coolwarm', alpha=0.3)
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=y, cmap='coolwarm', edgecolor='k')
plt.title("Customer Segmentation with KNN")
plt.xlabel("Income (scaled)")
plt.ylabel("Spending Score (scaled)")
plt.show()

💡 You should see regions showing different segments of customers — our KNN just made a segmentation strategy based on who spends how much. 💳✨


🧪 Step 5: Tuning K (aka “How many friends to trust?”)#

The number of neighbors K controls how smooth or chaotic your decision boundary becomes.

Let’s test different K values.

from sklearn.metrics import accuracy_score

scores = []
for k in range(1, 21):
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)
    scores.append(knn.score(X_test, y_test))

plt.plot(range(1, 21), scores, marker='o')
plt.title("K vs Accuracy")
plt.xlabel("Number of Neighbors (K)")
plt.ylabel("Accuracy")
plt.grid(True)
plt.show()

🧠 Try interpreting:

  • Low K: Very reactive, may overfit (believes the nearest gossip).

  • High K: Too smooth, may underfit (trusts everyone too much).

Choose your K wisely — business strategy meets social dynamics.


💼 Step 6: Business Interpretation#

Now, how does this matter in real life?

Segment

Description

Example Business Use

0

Low income / low spenders

Offer discount coupons or loyalty rewards

1

High income / high spenders

Upsell luxury products or premium memberships

2+

Other combinations

Personalized cross-sells

✨ You just did customer segmentation using distance-based reasoning — the heart of recommender systems, marketing analytics, and churn prediction!


🧩 TL;DR Summary#

Step

Concept

Business Angle

Data Prep

Scaling & splitting

Make sure income ≠ everything

Model

KNN

Lazy learner, smart pattern finder

K Tuning

Finding best K

Trust right number of neighbors

Visualization

Boundaries

Understand customer clusters

Insight

Segments

Strategy-ready grouping


“KNN doesn’t predict with equations — it predicts with empathy.” 💬 Similar people, similar outcomes.


⏭️ Next Chapter: Unsupervised Learning – Clustering & Dimensionality Reduction We’ll stop asking for labels altogether — and let the data find its own tribe. 🧭

# Your code here