Customer Segmentation for Business Insights¶

Why this matters¶
Customer segmentation helps businesses group similar customers to target marketing, personalize offers, and prioritize product development. This notebook shows a deterministic, browser-safe data-processing pipeline learners can run in Pyodide.
Learning objectives¶
Understand simple rules-based segmentation and why deterministic demos are useful for in-browser runs.
Load or synthesize a small customer table, compute basic features (revenue, frequency), and assign segments.
Visualize segmentation flow with a Mermaid diagram and run a Pyodide-safe demo that requires only the Python standard library.
Concept introduction¶
We’ll demonstrate a tiny segmentation pipeline: generate reproducible sample customers, compute features, and assign segments using a simple centroid assignment (no heavy ML). The demo is deterministic (uses random.seed) so it runs reliably in Pyodide.
Quick multiple-choice check¶
Q: Which of these is the best reason to prefer deterministic synthetic data in in-browser demos?
A) Faster than real data
B) Reproducibility and no network dependencies
C) More realistic than production data
Correct answer: B
Exercises (suggested)¶
Modify the centroid positions and observe how segment membership changes.
Add an
age-based rule to split high-value customers older than 40 into a separate VIP segment.Export the segment summary to CSV using only the Python standard library.
Summary¶
This page provides a safe, deterministic segmentation starter that learners can run in the browser. The code cell below contains the runnable demo.
Customer Segmentation Data Processing¶
# Pyodide-safe deterministic segmentation demo
import random
from math import sqrt
from collections import defaultdict
import csv
random.seed(42)
# Generate deterministic sample customers
customers = []
for i in range(1, 21):
age = random.randint(18, 70)
orders = random.randint(1, 25)
revenue = round(random.uniform(10, 500) * orders / 10, 2)
customers.append({
'id': i,
'age': age,
'orders': orders,
'revenue': revenue,
})
# Compute simple features: avg_order_value and frequency (orders)
for c in customers:
c['avg_order_value'] = round(c['revenue'] / max(1, c['orders']), 2)
c['freq'] = c['orders']
# Define simple centroids (avg_order_value, freq)
centroids = {
'Low-Value': (20.0, 3),
'Mid-Value': (60.0, 8),
'High-Value': (150.0, 15),
}
def assign_segment(c):
x = (c['avg_order_value'], c['freq'])
best = None
best_d = None
for name, center in centroids.items():
d = sqrt((x[0]-center[0])**2 + (x[1]-center[1])**2)
if best is None or d < best_d:
best = name
best_d = d
return best
for c in customers:
c['segment'] = assign_segment(c)
# Summarize
summary = defaultdict(lambda: {'count':0, 'revenue':0.0})
for c in customers:
s = c['segment']
summary[s]['count'] += 1
summary[s]['revenue'] += c['revenue']
print('Segment summary:')
for s, v in summary.items():
print(f"- {s}: {v['count']} customers, total revenue ${v['revenue']:.2f}")
# Optional: write CSV (Pyodide allows writing in-browser filehandles via the plugin)
with open('customer_segments.csv', 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=['id','age','orders','revenue','avg_order_value','freq','segment'])
writer.writeheader()
for c in customers:
writer.writerow({k: c[k] for k in ['id','age','orders','revenue','avg_order_value','freq','segment']})
print('\nWrote customer_segments.csv (browser file handle)')
Segment summary:
- Low-Value: 15 customers, total revenue $2801.20
- Mid-Value: 5 customers, total revenue $3110.46
Wrote customer_segments.csv (browser file handle)
# Your code here