Scalability Basics — Quick autoscaling guide - Programming for Machine Learning and Business

Business framing: Plan for elasticity: scale up when demand rises and scale down to control costs. This short helper shows how to think about autoscaling choices and instance counts; it’s runnable in the browser.

Quick autoscaling recommender¶

MCQ¶

Q: Which autoscaling option is best for sudden, short-lived bursts with low baseline traffic?
- A) Large VM fleet
- B) Serverless functions with a queue
- C) Stateful single-instance database
- (Answer: B)

Exercises¶

Extend autoscale_recommend to suggest a cool_down period and explain why cooldown matters.
Implement a small estimate_cost() helper that returns rough relative cost (low/medium/high) for serverless vs VM choices.
(Stretch) Add a warm_pool flag and prefer warm instances when startup latency is critical.

Notes: Inserted cells are intentionally small and deterministic for Pyodide. Existing explanations and diagrams remain below.

Scalability Basics¶

A system is scalable when growth in users, requests, or data does not immediately turn into outages, unacceptable latency, or runaway costs.

Business Framing¶

Imagine you built a recommendation tool for an e-commerce team. It works perfectly with 500 daily users. Then a marketing campaign drives 50,000 users in one afternoon.

If the system slows down or crashes, the model is not the problem anymore. The architecture is.

What Scalability Really Means¶

Scalability is the ability to grow capacity in a controlled way. That can involve:

serving more requests per second
storing more data
processing more background jobs
keeping response times stable as usage grows
increasing throughput without rewriting the whole product

Common Scaling Strategies¶

Strategy	Description	Good for
Vertical scaling	Make one machine stronger	quick fixes, small systems
Horizontal scaling	Add more machines or instances	web apps, APIs, stateless services
Caching	Reuse expensive results	dashboards, repeated reads, model outputs
Queueing	Process work asynchronously	emails, report generation, batch inference
Database optimization	Indexing, partitioning, replication	heavy read/write workloads
CDN and edge delivery	Move content closer to users	static assets, global access

Traffic Spike Anatomy¶

This shows a key principle: not every scaling problem should be solved in the same place.

If request routing is overloaded, add a load balancer.
If repeated reads are slow, add caching.
If slow tasks block the app, move them to a queue.
If the database is the bottleneck, tune schema or add replicas.

Throughput vs Latency¶

These two ideas are related but not identical:

Throughput is how much work the system completes in a period of time.
Latency is how long one request takes.

A system can have high throughput and still feel slow if each user waits too long.

Interactive Capacity Planner¶

Signs Your System Does Not Scale Yet¶

Symptom	Likely issue
CPU jumps to 100% during promotions	too few app instances or inefficient code
Database locks and slow queries	missing indexes or poor schema design
Background jobs pile up	not enough workers or no queue visibility
Same data queried repeatedly	cache is missing or misused
Costs rise faster than traffic	poor resource sizing or always-on overprovisioning

Mini Quiz¶

Practice Prompt¶

Suppose a financial dashboard serves 2,000 users at 9 a.m. and only 100 users at noon. Which parts of the system would you scale elastically, and which parts would you keep stable for consistency and security?

Takeaway¶

Scalability is not one feature or one cloud service. It is a collection of design decisions that make growth predictable instead of painful.