Imagine training your model like raising a puppy 🐶 — you can’t just shout “LEARN FASTER!” all the time. At first, you guide with big steps (high learning rate 🏃♀️), and later, smaller corrections (low learning rate 🧘). That’s the art of learning rate scheduling.
🎢 The Learning Rate Mood Swings¶
Your learning rate (LR) controls how much you adjust weights on each iteration:
Too high → you overshoot the minimum like a caffeine-addled squirrel 🐿️
Too low → you crawl slowly like a sleepy snail 🐌
So, instead of keeping it fixed, we change it over time to balance speed and precision.
🗺️ Common Schedules (a.k.a. LR Diet Plans)¶
| Schedule Type | Description | Metaphor |
|---|---|---|
| Step Decay | Drops LR every few epochs | “Lose 10% of your learning enthusiasm every month.” |
| Exponential Decay | Smooth exponential drop | Like your motivation graph during finals week 📉 |
| Cosine Annealing | Wavy pattern, restarts periodically | A rollercoaster that keeps coming back 🎢 |
| Cyclical LR | Oscillates between high & low values | The “HIIT workout” of learning rates 💪 |
| Warmup + Decay | Start small, then go fast, then cool down | Like brewing the perfect cup of coffee ☕ |
🧪 Try It in PyTorch¶
import torch
import torch.nn as nn
import torch.optim as optim
# Dummy model
model = nn.Linear(1, 1)
optimizer = optim.Adam(model.parameters(), lr=0.1)
# Simple scheduler
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.5)
for epoch in range(30):
optimizer.zero_grad()
loss = torch.tensor(1.0, requires_grad=True) # Dummy loss
loss.backward()
optimizer.step()
scheduler.step()
print(f"Epoch {epoch+1:02d} - LR: {scheduler.get_last_lr()[0]:.5f}")💡 Try changing to:
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=20)and observe how it smoothly cycles your learning rate.
🎨 Visualize It Like a Boss¶
import matplotlib.pyplot as plt
import numpy as np
epochs = np.arange(0, 50)
lr = 0.1 * np.exp(-0.05 * epochs)
plt.plot(epochs, lr)
plt.title("Exponential Decay Learning Rate")
plt.xlabel("Epoch")
plt.ylabel("Learning Rate")
plt.show()That little curve? That’s your optimizer learning when to chill and when to sprint 🏃♂️🧘.
🧠 Quick Tips¶
Don’t obsess over which schedule — just use one.
Pair schedules with Adam or SGD with momentum.
Warmup helps prevent “first-epoch chaos.”
Cyclical LR often gives surprising boosts!
🎯 TL;DR¶
| Situation | Recommended Schedule |
|---|---|
| Small dataset | StepLR or Constant |
| Large dataset | ExponentialDecay or CosineAnnealing |
| Transformers or Deep Models | Warmup + Linear Decay |
| Experimental fun | CyclicalLR 🔄 |
💬 “Learning rates are like coffee: start strong, ease off, and never forget to take breaks.” ☕
🔗 Next Up: Numerical Stability & Vectorization – because even your optimizer can panic if your numbers explode 💥.
# Your code here