Learning Rate Schedules

Learning Rate Schedules#

Imagine training your model like raising a puppy 🐶 — you can’t just shout “LEARN FASTER!” all the time. At first, you guide with big steps (high learning rate 🏃‍♀️), and later, smaller corrections (low learning rate 🧘). That’s the art of learning rate scheduling.

🎢 The Learning Rate Mood Swings#

Your learning rate (LR) controls how much you adjust weights on each iteration:

Too high → you overshoot the minimum like a caffeine-addled squirrel 🐿️
Too low → you crawl slowly like a sleepy snail 🐌

So, instead of keeping it fixed, we change it over time to balance speed and precision.

🗺️ Common Schedules (a.k.a. LR Diet Plans)#

Schedule Type	Description	Metaphor
Step Decay	Drops LR every few epochs	“Lose 10% of your learning enthusiasm every month.”
Exponential Decay	Smooth exponential drop	Like your motivation graph during finals week 📉
Cosine Annealing	Wavy pattern, restarts periodically	A rollercoaster that keeps coming back 🎢
Cyclical LR	Oscillates between high & low values	The “HIIT workout” of learning rates 💪
Warmup + Decay	Start small, then go fast, then cool down	Like brewing the perfect cup of coffee ☕

🧪 Try It in PyTorch#

import torch
import torch.nn as nn
import torch.optim as optim

# Dummy model
model = nn.Linear(1, 1)
optimizer = optim.Adam(model.parameters(), lr=0.1)

# Simple scheduler
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.5)

for epoch in range(30):
    optimizer.zero_grad()
    loss = torch.tensor(1.0, requires_grad=True)  # Dummy loss
    loss.backward()
    optimizer.step()
    scheduler.step()
    print(f"Epoch {epoch+1:02d} - LR: {scheduler.get_last_lr()[0]:.5f}")

💡 Try changing to:

scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=20)

and observe how it smoothly cycles your learning rate.

🎨 Visualize It Like a Boss#

import matplotlib.pyplot as plt
import numpy as np

epochs = np.arange(0, 50)
lr = 0.1 * np.exp(-0.05 * epochs)

plt.plot(epochs, lr)
plt.title("Exponential Decay Learning Rate")
plt.xlabel("Epoch")
plt.ylabel("Learning Rate")
plt.show()

That little curve? That’s your optimizer learning when to chill and when to sprint 🏃‍♂️🧘.

🧠 Quick Tips#

Don’t obsess over which schedule — just use one.
Pair schedules with Adam or SGD with momentum.
Warmup helps prevent “first-epoch chaos.”
Cyclical LR often gives surprising boosts!

🎯 TL;DR#

Situation	Recommended Schedule
Small dataset	StepLR or Constant
Large dataset	ExponentialDecay or CosineAnnealing
Transformers or Deep Models	Warmup + Linear Decay
Experimental fun	CyclicalLR 🔄

💬 “Learning rates are like coffee: start strong, ease off, and never forget to take breaks.” ☕

🔗 Next Up: Numerical Stability & Vectorization – because even your optimizer can panic if your numbers explode 💥.

# Your code here