Learning Rate Schedules#
Imagine training your model like raising a puppy 🐶 — you can’t just shout “LEARN FASTER!” all the time. At first, you guide with big steps (high learning rate 🏃♀️), and later, smaller corrections (low learning rate 🧘). That’s the art of learning rate scheduling.
🎢 The Learning Rate Mood Swings#
Your learning rate (LR) controls how much you adjust weights on each iteration:
Too high → you overshoot the minimum like a caffeine-addled squirrel 🐿️
Too low → you crawl slowly like a sleepy snail 🐌
So, instead of keeping it fixed, we change it over time to balance speed and precision.
🗺️ Common Schedules (a.k.a. LR Diet Plans)#
Schedule Type |
Description |
Metaphor |
|---|---|---|
Step Decay |
Drops LR every few epochs |
“Lose 10% of your learning enthusiasm every month.” |
Exponential Decay |
Smooth exponential drop |
Like your motivation graph during finals week 📉 |
Cosine Annealing |
Wavy pattern, restarts periodically |
A rollercoaster that keeps coming back 🎢 |
Cyclical LR |
Oscillates between high & low values |
The “HIIT workout” of learning rates 💪 |
Warmup + Decay |
Start small, then go fast, then cool down |
Like brewing the perfect cup of coffee ☕ |
🧪 Try It in PyTorch#
import torch
import torch.nn as nn
import torch.optim as optim
# Dummy model
model = nn.Linear(1, 1)
optimizer = optim.Adam(model.parameters(), lr=0.1)
# Simple scheduler
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.5)
for epoch in range(30):
optimizer.zero_grad()
loss = torch.tensor(1.0, requires_grad=True) # Dummy loss
loss.backward()
optimizer.step()
scheduler.step()
print(f"Epoch {epoch+1:02d} - LR: {scheduler.get_last_lr()[0]:.5f}")
💡 Try changing to:
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=20)
and observe how it smoothly cycles your learning rate.
🎨 Visualize It Like a Boss#
import matplotlib.pyplot as plt
import numpy as np
epochs = np.arange(0, 50)
lr = 0.1 * np.exp(-0.05 * epochs)
plt.plot(epochs, lr)
plt.title("Exponential Decay Learning Rate")
plt.xlabel("Epoch")
plt.ylabel("Learning Rate")
plt.show()
That little curve? That’s your optimizer learning when to chill and when to sprint 🏃♂️🧘.
🧠 Quick Tips#
Don’t obsess over which schedule — just use one.
Pair schedules with Adam or SGD with momentum.
Warmup helps prevent “first-epoch chaos.”
Cyclical LR often gives surprising boosts!
🎯 TL;DR#
Situation |
Recommended Schedule |
|---|---|
Small dataset |
StepLR or Constant |
Large dataset |
ExponentialDecay or CosineAnnealing |
Transformers or Deep Models |
Warmup + Linear Decay |
Experimental fun |
CyclicalLR 🔄 |
💬 “Learning rates are like coffee: start strong, ease off, and never forget to take breaks.” ☕
🔗 Next Up: Numerical Stability & Vectorization – because even your optimizer can panic if your numbers explode 💥.
# Your code here