Business framing: Plan for elasticity: scale up when demand rises and scale down to control costs. This short helper shows how to think about autoscaling choices and instance counts; it’s runnable in the browser.
Quick autoscaling recommender¶
MCQ¶
Q: Which autoscaling option is best for sudden, short-lived bursts with low baseline traffic?
A) Large VM fleet
B) Serverless functions with a queue
C) Stateful single-instance database
(Answer: B)
Exercises¶
Extend
autoscale_recommendto suggest acool_downperiod and explain why cooldown matters.Implement a small
estimate_cost()helper that returns rough relative cost (low/medium/high) for serverless vs VM choices.(Stretch) Add a
warm_poolflag and prefer warm instances when startup latency is critical.
Notes: Inserted cells are intentionally small and deterministic for Pyodide. Existing explanations and diagrams remain below.
Scalability Basics¶
Scalability Basics¶
A system is scalable when growth in users, requests, or data does not immediately turn into outages, unacceptable latency, or runaway costs.
Business Framing¶
Imagine you built a recommendation tool for an e-commerce team. It works perfectly with 500 daily users. Then a marketing campaign drives 50,000 users in one afternoon.
If the system slows down or crashes, the model is not the problem anymore. The architecture is.
What Scalability Really Means¶
Scalability is the ability to grow capacity in a controlled way. That can involve:
serving more requests per second
storing more data
processing more background jobs
keeping response times stable as usage grows
increasing throughput without rewriting the whole product
Common Scaling Strategies¶
| Strategy | Description | Good for |
|---|---|---|
| Vertical scaling | Make one machine stronger | quick fixes, small systems |
| Horizontal scaling | Add more machines or instances | web apps, APIs, stateless services |
| Caching | Reuse expensive results | dashboards, repeated reads, model outputs |
| Queueing | Process work asynchronously | emails, report generation, batch inference |
| Database optimization | Indexing, partitioning, replication | heavy read/write workloads |
| CDN and edge delivery | Move content closer to users | static assets, global access |
Traffic Spike Anatomy¶
This shows a key principle: not every scaling problem should be solved in the same place.
If request routing is overloaded, add a load balancer.
If repeated reads are slow, add caching.
If slow tasks block the app, move them to a queue.
If the database is the bottleneck, tune schema or add replicas.
Throughput vs Latency¶
These two ideas are related but not identical:
Throughput is how much work the system completes in a period of time.
Latency is how long one request takes.
A system can have high throughput and still feel slow if each user waits too long.
Interactive Capacity Planner¶
Signs Your System Does Not Scale Yet¶
| Symptom | Likely issue |
|---|---|
| CPU jumps to 100% during promotions | too few app instances or inefficient code |
| Database locks and slow queries | missing indexes or poor schema design |
| Background jobs pile up | not enough workers or no queue visibility |
| Same data queried repeatedly | cache is missing or misused |
| Costs rise faster than traffic | poor resource sizing or always-on overprovisioning |
Mini Quiz¶
1. When is horizontal scaling usually easier than vertical scaling?
Horizontal scaling is usually easier for stateless application services, because you can add more identical instances behind a load balancer.
2. Why does caching help scalability?
Caching reduces repeated expensive work. That lowers load on databases and services, which improves both latency and capacity.
Practice Prompt¶
Suppose a financial dashboard serves 2,000 users at 9 a.m. and only 100 users at noon. Which parts of the system would you scale elastically, and which parts would you keep stable for consistency and security?
Takeaway¶
Scalability is not one feature or one cloud service. It is a collection of design decisions that make growth predictable instead of painful.