Lab – Comparing GD Variants#

⏳ Loading Pyodide…

Welcome to the Optimization Olympics 🏅! Today, we’ll watch different Gradient Descent variants race to find the global minimum — and see which one deserves the gold medal 🥇.


🏁 Objective#

Compare how Batch GD, Stochastic GD, and Mini-Batch GD (and their cooler cousins like Adam) behave in real training scenarios.

You’ll:

  • Visualize their learning paths 🎢

  • Compare convergence speeds ⏱️

  • Observe how hyperparameters change the story 📊


📦 Setup#

Let’s load some necessary libraries (no doping allowed 🚫💉):

`


🎯 The Loss Landscape#

We’ll simulate a simple quadratic loss function: [ L(w) = (w - 3)^2 + 2 ] Its minimum is at w = 3 (the “finish line”).


🏃‍♀️ Gradient Descent Variants in Action#

1️⃣ Batch Gradient Descent#

Calculates the gradient on the entire dataset every time.

2️⃣ Stochastic Gradient Descent#

Updates weights after each sample — noisy but fast ⚡.

3️⃣ Momentum (GD with Momentum)#

Adds a bit of physics — keeps rolling through small bumps 🏎️.


📉 Visualizing the Race#


🧠 Bonus Round: Enter Adam!#

Add to the plot:

You’ll notice Adam zooms to the minimum while others are still stretching. 🏃‍♂️💨


📊 Your Turn#

🧩 Try this:#

  1. Change the learning rate (lr = 0.01, 0.5, etc.)

  2. Add momentum to SGD.

  3. Visualize how stability and convergence differ.

See how your model behaves when it’s “too excited” vs “too sleepy” 😴.


🎯 Key Takeaways#

Optimizer

Strength

Weakness

Batch GD

Smooth & stable

Slow for large data

SGD

Fast & scalable

Noisy updates

Momentum

Smooths oscillations

Needs tuning

Adam

Adaptive & fast

Sometimes overfits


🧩 Business Analogy#

Optimizer

Business Personality

SGD

The hustler – moves fast, breaks things.

Batch GD

The analyst – waits for all data, then acts.

Momentum

The marathon runner – steady and strong.

Adam

The consultant – adapts to everything and charges more. 💼


💬 “Optimization is like coffee brewing: get the temperature (learning rate) right, stir well (momentum), and don’t overdo it (overfitting).”


🧰 Continue Exploring#

  • Run this notebook on Colab or JupyterLite using the buttons above.

  • Modify the loss function to something non-convex and see how optimizers behave in the wild. 🏞️


🔗 Next Chapter: Supervised Classification – Trees & Friends 🌳 Because predicting “who buys what” is the true business magic. ✨

# Your code here