Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Business framing: Choose cloud services that match your team’s operational capacity and expected load. This short helper focuses on cost vs performance trade-offs and is safe to run in the browser.


Quick concept

  • Serverless (functions, managed endpoints): low ops, pay-per-use, great for spiky or unpredictable traffic.

  • VMs/Containers: more control, better for long-running or highly tuned workloads, potentially lower steady-state cost at scale.


Small Pyodide-safe helper: choose serverless vs VM


Exercises

  1. Adjust the helper to return expected cost factors (rough monthly estimate) for each choice.

  2. Add a reliability_requirement parameter and prefer VMs with multiple AZs when reliability is critical.

  3. Sketch a short checklist describing when to move from serverless to containerized deployments.


Notes: This notebook already contains a broader recommender and service categories below; the new top cells provide a quick practical decision you can run in-browser.

Cloud Architecture and Services for ML

Cloud Architecture and Services for ML

Cloud platforms give you on-demand compute, storage, networking, and managed AI services. The challenge is not just learning service names, but choosing the right building blocks for the workload you actually need to run.

Why Cloud Matters for AI Products

Machine learning systems often need resources that are hard to manage on a single laptop or static server:

  • scalable APIs for model inference

  • object storage for datasets and model artifacts

  • scheduled jobs for training and evaluation

  • managed databases for application state

  • monitoring, security, and access control

Cloud architecture matters because it turns experiments into reliable products.

The Big Three Providers

ProviderStrengthsTypical ML and business use
AWSbroad service catalog and mature ecosystemdata lakes, model deployment, enterprise workloads
GCPstrong data and AI toolinganalytics, Vertex AI, modern data pipelines
Azuredeep enterprise integrationMicrosoft-centric stacks, governance-heavy environments

A Typical ML Product on the Cloud

Service Categories You Should Recognize

CategoryWhat it doesExample services
Computeruns code and servicesEC2, Compute Engine, Azure VMs, containers, serverless
Storagekeeps files and artifactsS3, GCS, Azure Blob Storage
Databasesstores operational dataRDS, Cloud SQL, Cosmos DB, DynamoDB
Networkingroutes and protects trafficload balancers, VPCs, gateways, CDNs
ML platformmanages training and deploymentSageMaker, Vertex AI, Azure ML
Observabilitytracks health and usageCloudWatch, Cloud Monitoring, Azure Monitor

Managed Services vs Self-Managed Infrastructure

Use managed services when you want faster delivery and lower operational burden. Use self-managed infrastructure when you need unusually deep control, specialized tuning, or a strong reason to own the operational complexity.

Interactive Architecture Recommender

Common Pitfalls

  • choosing too many services before the team understands the request flow

  • storing operational data and analytical data with no clear separation

  • forgetting IAM and secret management until late in the project

  • ignoring observability until the first incident happens

  • leaving expensive resources running with no cost controls

Mini Quiz

1. Why is object storage so common in ML systems?

Because datasets, model files, logs, and exports are often large and are a poor fit for traditional relational databases.

2. Why do teams prefer managed databases and managed ML services early on?

Because they reduce operational burden, speed up delivery, and let the team focus more on product value than infrastructure maintenance.

Practice Prompt

Design a cloud architecture for a churn-prediction product that retrains weekly, serves predictions to a CRM dashboard, stores CSV exports, and sends alerts when model quality drops.

Takeaway

Cloud architecture is about choosing the simplest set of services that can reliably support the product, the data flow, and the growth you expect.