Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Business framing: Plan for elasticity: scale up when demand rises and scale down to control costs. This short helper shows how to think about autoscaling choices and instance counts; it’s runnable in the browser.


Quick autoscaling recommender


MCQ

  • Q: Which autoscaling option is best for sudden, short-lived bursts with low baseline traffic?

    • A) Large VM fleet

    • B) Serverless functions with a queue

    • C) Stateful single-instance database

    • (Answer: B)

Exercises

  1. Extend autoscale_recommend to suggest a cool_down period and explain why cooldown matters.

  2. Implement a small estimate_cost() helper that returns rough relative cost (low/medium/high) for serverless vs VM choices.

  3. (Stretch) Add a warm_pool flag and prefer warm instances when startup latency is critical.


Notes: Inserted cells are intentionally small and deterministic for Pyodide. Existing explanations and diagrams remain below.

Scalability Basics

Scalability Basics

A system is scalable when growth in users, requests, or data does not immediately turn into outages, unacceptable latency, or runaway costs.

Business Framing

Imagine you built a recommendation tool for an e-commerce team. It works perfectly with 500 daily users. Then a marketing campaign drives 50,000 users in one afternoon.

If the system slows down or crashes, the model is not the problem anymore. The architecture is.

What Scalability Really Means

Scalability is the ability to grow capacity in a controlled way. That can involve:

  • serving more requests per second

  • storing more data

  • processing more background jobs

  • keeping response times stable as usage grows

  • increasing throughput without rewriting the whole product

Common Scaling Strategies

StrategyDescriptionGood for
Vertical scalingMake one machine strongerquick fixes, small systems
Horizontal scalingAdd more machines or instancesweb apps, APIs, stateless services
CachingReuse expensive resultsdashboards, repeated reads, model outputs
QueueingProcess work asynchronouslyemails, report generation, batch inference
Database optimizationIndexing, partitioning, replicationheavy read/write workloads
CDN and edge deliveryMove content closer to usersstatic assets, global access

Traffic Spike Anatomy

This shows a key principle: not every scaling problem should be solved in the same place.

  • If request routing is overloaded, add a load balancer.

  • If repeated reads are slow, add caching.

  • If slow tasks block the app, move them to a queue.

  • If the database is the bottleneck, tune schema or add replicas.

Throughput vs Latency

These two ideas are related but not identical:

  • Throughput is how much work the system completes in a period of time.

  • Latency is how long one request takes.

A system can have high throughput and still feel slow if each user waits too long.

Interactive Capacity Planner

Signs Your System Does Not Scale Yet

SymptomLikely issue
CPU jumps to 100% during promotionstoo few app instances or inefficient code
Database locks and slow queriesmissing indexes or poor schema design
Background jobs pile upnot enough workers or no queue visibility
Same data queried repeatedlycache is missing or misused
Costs rise faster than trafficpoor resource sizing or always-on overprovisioning

Mini Quiz

1. When is horizontal scaling usually easier than vertical scaling?

Horizontal scaling is usually easier for stateless application services, because you can add more identical instances behind a load balancer.

2. Why does caching help scalability?

Caching reduces repeated expensive work. That lowers load on databases and services, which improves both latency and capacity.

Practice Prompt

Suppose a financial dashboard serves 2,000 users at 9 a.m. and only 100 users at noon. Which parts of the system would you scale elastically, and which parts would you keep stable for consistency and security?

Takeaway

Scalability is not one feature or one cloud service. It is a collection of design decisions that make growth predictable instead of painful.