Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Have you ever seen your training loss suddenly turn into NaN (Not-a-Number)? That’s your model screaming:

“Help! I just divided by zero or overflowed to infinity!” 😭

Welcome to the world of Numerical Stability — where numbers behave badly and we must keep them under control.


💣 What Can Go Wrong?

Let’s meet the villains of numerical chaos 👇

VillainDescriptionExample Disaster
OverflowNumbers get too big to storeexp(1000) → 💥
UnderflowNumbers get too small to noticeexp(-1000)0.0
Division by ZeroThe forbidden math act 😬1/0 → Infinity
Loss ExplosionTraining loss goes from 0.3 → 300,000 overnightClassic deep learning drama 🎭

🧊 Cool Tricks to Stay Stable

Here’s how the pros keep their models from catching fire 🔥:

🧯 1. Log-Sum-Exp Trick

When dealing with probabilities: [ \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} ] can overflow if (x_i) is huge. Use: [ \text{softmax}(x_i) = \frac{e^{x_i - \max(x)}}{\sum_j e^{x_j - \max(x)}} ] ✅ Subtracting max(x) keeps numbers in check.


🧮 2. Add Epsilon (ϵ)

When dividing or taking logs, add a tiny constant to stay safe:

import numpy as np
x = np.array([0.0, 1.0, 2.0])
stable_log = np.log(x + 1e-8)

It’s like putting bubble wrap around your math 📦.


🧤 3. Gradient Clipping

During backpropagation, gradients sometimes go berserk 🤯. Clamp them down:

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

Think of it as telling your model, “Calm down, champ.” 🧘


⚡ 4. Normalize Inputs

Feeding unscaled data to a model is like giving a toddler 10 espressos ☕. Normalize features: [ x’ = \frac{x - \mu}{\sigma} ] so everything stays balanced.


🧠 Vectorization: The Zen of Speed and Stability

Loops in Python are like waiting in a grocery line with one cashier 🐢. Vectorized operations (NumPy, PyTorch, etc.) open 20 registers at once 🏎️.

Example:

# Slow (loop)
squares = []
for i in range(1_000_000):
    squares.append(i ** 2)

# Fast (vectorized)
import numpy as np
squares = np.arange(1_000_000) ** 2

Result: Same output, but the vectorized version runs ~100x faster and is more numerically stable due to optimized C backend.


🧪 Practice: Find the Stability Hero

import torch
import torch.nn.functional as F

# Naive softmax (dangerous!)
x = torch.tensor([1000.0, 1000.0])
print("Naive softmax:", F.softmax(x, dim=0))

# Stable version
x_stable = x - torch.max(x)
print("Stable softmax:", F.softmax(x_stable, dim=0))

Check the difference — the first one will blow up 💥, the second one will calmly return [0.5, 0.5]. 😌


🎯 TL;DR — Stability Survival Kit

ProblemFix
OverflowUse log-sum-exp
Division by zeroAdd epsilon (1e-8)
Exploding gradientsClip them
Unstable featuresNormalize or standardize
Slow mathVectorize everything

💬 “Numerical stability: because one NaN can ruin your entire day.” 😅


🔗 Next Up: Optimization Lab – Comparing GD Variants Time to put all your optimizer powers to the test in one grand experiment ⚔️.

# Your code here