Learning Rate Schedules#

Imagine training your model like raising a puppy 🐶 — you can’t just shout “LEARN FASTER!” all the time. At first, you guide with big steps (high learning rate 🏃‍♀️), and later, smaller corrections (low learning rate 🧘). That’s the art of learning rate scheduling.


🎢 The Learning Rate Mood Swings#

Your learning rate (LR) controls how much you adjust weights on each iteration:

  • Too high → you overshoot the minimum like a caffeine-addled squirrel 🐿️

  • Too low → you crawl slowly like a sleepy snail 🐌

So, instead of keeping it fixed, we change it over time to balance speed and precision.


🗺️ Common Schedules (a.k.a. LR Diet Plans)#

Schedule Type

Description

Metaphor

Step Decay

Drops LR every few epochs

“Lose 10% of your learning enthusiasm every month.”

Exponential Decay

Smooth exponential drop

Like your motivation graph during finals week 📉

Cosine Annealing

Wavy pattern, restarts periodically

A rollercoaster that keeps coming back 🎢

Cyclical LR

Oscillates between high & low values

The “HIIT workout” of learning rates 💪

Warmup + Decay

Start small, then go fast, then cool down

Like brewing the perfect cup of coffee ☕


🧪 Try It in PyTorch#

import torch
import torch.nn as nn
import torch.optim as optim

# Dummy model
model = nn.Linear(1, 1)
optimizer = optim.Adam(model.parameters(), lr=0.1)

# Simple scheduler
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.5)

for epoch in range(30):
    optimizer.zero_grad()
    loss = torch.tensor(1.0, requires_grad=True)  # Dummy loss
    loss.backward()
    optimizer.step()
    scheduler.step()
    print(f"Epoch {epoch+1:02d} - LR: {scheduler.get_last_lr()[0]:.5f}")

💡 Try changing to:

scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=20)

and observe how it smoothly cycles your learning rate.


🎨 Visualize It Like a Boss#

import matplotlib.pyplot as plt
import numpy as np

epochs = np.arange(0, 50)
lr = 0.1 * np.exp(-0.05 * epochs)

plt.plot(epochs, lr)
plt.title("Exponential Decay Learning Rate")
plt.xlabel("Epoch")
plt.ylabel("Learning Rate")
plt.show()

That little curve? That’s your optimizer learning when to chill and when to sprint 🏃‍♂️🧘.


🧠 Quick Tips#

  • Don’t obsess over which schedule — just use one.

  • Pair schedules with Adam or SGD with momentum.

  • Warmup helps prevent “first-epoch chaos.”

  • Cyclical LR often gives surprising boosts!


🎯 TL;DR#

Situation

Recommended Schedule

Small dataset

StepLR or Constant

Large dataset

ExponentialDecay or CosineAnnealing

Transformers or Deep Models

Warmup + Linear Decay

Experimental fun

CyclicalLR 🔄


💬 “Learning rates are like coffee: start strong, ease off, and never forget to take breaks.”


🔗 Next Up: Numerical Stability & Vectorization – because even your optimizer can panic if your numbers explode 💥.

# Your code here