Lab – Comparing GD Variants#
Welcome to the Optimization Olympics 🏅! Today, we’ll watch different Gradient Descent variants race to find the global minimum — and see which one deserves the gold medal 🥇.
🏁 Objective#
Compare how Batch GD, Stochastic GD, and Mini-Batch GD (and their cooler cousins like Adam) behave in real training scenarios.
You’ll:
Visualize their learning paths 🎢
Compare convergence speeds ⏱️
Observe how hyperparameters change the story 📊
📦 Setup#
Let’s load some necessary libraries (no doping allowed 🚫💉):
`
🎯 The Loss Landscape#
We’ll simulate a simple quadratic loss function: [ L(w) = (w - 3)^2 + 2 ] Its minimum is at w = 3 (the “finish line”).
🏃♀️ Gradient Descent Variants in Action#
1️⃣ Batch Gradient Descent#
Calculates the gradient on the entire dataset every time.
2️⃣ Stochastic Gradient Descent#
Updates weights after each sample — noisy but fast ⚡.
3️⃣ Momentum (GD with Momentum)#
Adds a bit of physics — keeps rolling through small bumps 🏎️.
📉 Visualizing the Race#
🧠 Bonus Round: Enter Adam!#
Add to the plot:
You’ll notice Adam zooms to the minimum while others are still stretching. 🏃♂️💨
📊 Your Turn#
🧩 Try this:#
Change the learning rate (
lr = 0.01,0.5, etc.)Add momentum to SGD.
Visualize how stability and convergence differ.
See how your model behaves when it’s “too excited” vs “too sleepy” 😴.
🎯 Key Takeaways#
Optimizer |
Strength |
Weakness |
|---|---|---|
Batch GD |
Smooth & stable |
Slow for large data |
SGD |
Fast & scalable |
Noisy updates |
Momentum |
Smooths oscillations |
Needs tuning |
Adam |
Adaptive & fast |
Sometimes overfits |
🧩 Business Analogy#
Optimizer |
Business Personality |
|---|---|
SGD |
The hustler – moves fast, breaks things. |
Batch GD |
The analyst – waits for all data, then acts. |
Momentum |
The marathon runner – steady and strong. |
Adam |
The consultant – adapts to everything and charges more. 💼 |
💬 “Optimization is like coffee brewing: get the temperature (learning rate) right, stir well (momentum), and don’t overdo it (overfitting).” ☕
🧰 Continue Exploring#
Run this notebook on Colab or JupyterLite using the buttons above.
Modify the loss function to something non-convex and see how optimizers behave in the wild. 🏞️
🔗 Next Chapter: Supervised Classification – Trees & Friends 🌳 Because predicting “who buys what” is the true business magic. ✨
# Your code here