Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Customer Segmentation for Business Insights

Why this matters

Customer segmentation helps businesses group similar customers to target marketing, personalize offers, and prioritize product development. This notebook shows a deterministic, browser-safe data-processing pipeline learners can run in Pyodide.

Learning objectives

  • Understand simple rules-based segmentation and why deterministic demos are useful for in-browser runs.

  • Load or synthesize a small customer table, compute basic features (revenue, frequency), and assign segments.

  • Visualize segmentation flow with a Mermaid diagram and run a Pyodide-safe demo that requires only the Python standard library.

Concept introduction

We’ll demonstrate a tiny segmentation pipeline: generate reproducible sample customers, compute features, and assign segments using a simple centroid assignment (no heavy ML). The demo is deterministic (uses random.seed) so it runs reliably in Pyodide.


Quick multiple-choice check

Q: Which of these is the best reason to prefer deterministic synthetic data in in-browser demos?

  • A) Faster than real data

  • B) Reproducibility and no network dependencies

  • C) More realistic than production data

Correct answer: B


Exercises (suggested)

  1. Modify the centroid positions and observe how segment membership changes.

  2. Add an age-based rule to split high-value customers older than 40 into a separate VIP segment.

  3. Export the segment summary to CSV using only the Python standard library.


Summary

This page provides a safe, deterministic segmentation starter that learners can run in the browser. The code cell below contains the runnable demo.

Customer Segmentation Data Processing

# Pyodide-safe deterministic segmentation demo

import random
from math import sqrt
from collections import defaultdict
import csv

random.seed(42)

# Generate deterministic sample customers
customers = []
for i in range(1, 21):
    age = random.randint(18, 70)
    orders = random.randint(1, 25)
    revenue = round(random.uniform(10, 500) * orders / 10, 2)
    customers.append({
        'id': i,
        'age': age,
        'orders': orders,
        'revenue': revenue,
    })

# Compute simple features: avg_order_value and frequency (orders)
for c in customers:
    c['avg_order_value'] = round(c['revenue'] / max(1, c['orders']), 2)
    c['freq'] = c['orders']

# Define simple centroids (avg_order_value, freq)
centroids = {
    'Low-Value': (20.0, 3),
    'Mid-Value': (60.0, 8),
    'High-Value': (150.0, 15),
}

def assign_segment(c):
    x = (c['avg_order_value'], c['freq'])
    best = None
    best_d = None
    for name, center in centroids.items():
        d = sqrt((x[0]-center[0])**2 + (x[1]-center[1])**2)
        if best is None or d < best_d:
            best = name
            best_d = d
    return best

for c in customers:
    c['segment'] = assign_segment(c)

# Summarize
summary = defaultdict(lambda: {'count':0, 'revenue':0.0})
for c in customers:
    s = c['segment']
    summary[s]['count'] += 1
    summary[s]['revenue'] += c['revenue']

print('Segment summary:')
for s, v in summary.items():
    print(f"- {s}: {v['count']} customers, total revenue ${v['revenue']:.2f}")

# Optional: write CSV (Pyodide allows writing in-browser filehandles via the plugin)
with open('customer_segments.csv', 'w', newline='') as f:
    writer = csv.DictWriter(f, fieldnames=['id','age','orders','revenue','avg_order_value','freq','segment'])
    writer.writeheader()
    for c in customers:
        writer.writerow({k: c[k] for k in ['id','age','orders','revenue','avg_order_value','freq','segment']})

print('\nWrote customer_segments.csv (browser file handle)')
Segment summary:
- Low-Value: 15 customers, total revenue $2801.20
- Mid-Value: 5 customers, total revenue $3110.46

Wrote customer_segments.csv (browser file handle)
# Your code here