Customer Segmentation for Business Insights¶

Why this matters¶

Customer segmentation helps businesses group similar customers to target marketing, personalize offers, and prioritize product development. This notebook shows a deterministic, browser-safe data-processing pipeline learners can run in Pyodide.

Learning objectives¶

Understand simple rules-based segmentation and why deterministic demos are useful for in-browser runs.
Load or synthesize a small customer table, compute basic features (revenue, frequency), and assign segments.
Visualize segmentation flow with a Mermaid diagram and run a Pyodide-safe demo that requires only the Python standard library.

Concept introduction¶

We’ll demonstrate a tiny segmentation pipeline: generate reproducible sample customers, compute features, and assign segments using a simple centroid assignment (no heavy ML). The demo is deterministic (uses random.seed) so it runs reliably in Pyodide.

Quick multiple-choice check¶

Q: Which of these is the best reason to prefer deterministic synthetic data in in-browser demos?

A) Faster than real data
B) Reproducibility and no network dependencies
C) More realistic than production data

Correct answer: B

Exercises (suggested)¶

Modify the centroid positions and observe how segment membership changes.
Add an age-based rule to split high-value customers older than 40 into a separate VIP segment.
Export the segment summary to CSV using only the Python standard library.

Summary¶

This page provides a safe, deterministic segmentation starter that learners can run in the browser. The code cell below contains the runnable demo.

Customer Segmentation Data Processing¶

# Pyodide-safe deterministic segmentation demo

import random
from math import sqrt
from collections import defaultdict
import csv

random.seed(42)

# Generate deterministic sample customers
customers = []
for i in range(1, 21):
    age = random.randint(18, 70)
    orders = random.randint(1, 25)
    revenue = round(random.uniform(10, 500) * orders / 10, 2)
    customers.append({
        'id': i,
        'age': age,
        'orders': orders,
        'revenue': revenue,
    })

# Compute simple features: avg_order_value and frequency (orders)
for c in customers:
    c['avg_order_value'] = round(c['revenue'] / max(1, c['orders']), 2)
    c['freq'] = c['orders']

# Define simple centroids (avg_order_value, freq)
centroids = {
    'Low-Value': (20.0, 3),
    'Mid-Value': (60.0, 8),
    'High-Value': (150.0, 15),
}

def assign_segment(c):
    x = (c['avg_order_value'], c['freq'])
    best = None
    best_d = None
    for name, center in centroids.items():
        d = sqrt((x[0]-center[0])**2 + (x[1]-center[1])**2)
        if best is None or d < best_d:
            best = name
            best_d = d
    return best

for c in customers:
    c['segment'] = assign_segment(c)

# Summarize
summary = defaultdict(lambda: {'count':0, 'revenue':0.0})
for c in customers:
    s = c['segment']
    summary[s]['count'] += 1
    summary[s]['revenue'] += c['revenue']

print('Segment summary:')
for s, v in summary.items():
    print(f"- {s}: {v['count']} customers, total revenue ${v['revenue']:.2f}")

# Optional: write CSV (Pyodide allows writing in-browser filehandles via the plugin)
with open('customer_segments.csv', 'w', newline='') as f:
    writer = csv.DictWriter(f, fieldnames=['id','age','orders','revenue','avg_order_value','freq','segment'])
    writer.writeheader()
    for c in customers:
        writer.writerow({k: c[k] for k in ['id','age','orders','revenue','avg_order_value','freq','segment']})

print('\nWrote customer_segments.csv (browser file handle)')

Segment summary:
- Low-Value: 15 customers, total revenue $2801.20
- Mid-Value: 5 customers, total revenue $3110.46

Wrote customer_segments.csv (browser file handle)

# Your code here