Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

“Because even data scientists occasionally forget what Σ means.”


🎯 Why This Exists

Let’s be honest — everyone says “I love math” until they meet matrix derivatives or eigenvalues. This cheat-sheet is your life jacket when swimming through ML math. It’s not meant to make you a mathematician — just to keep your neurons afloat long enough to train a model without crying.

💡 “Mathematics: turning coffee into theorems and stress into learning curves.”


📏 Linear Algebra in 60 Seconds

ConceptWhat It Really MeansExample
VectorFancy list of numbers[3, 4, 5]
MatrixGrid of numbers that bully your RAM[[1, 2], [3, 4]]
Dot Product“How similar are these two vibes?”A·B = Σ aᵢbᵢ
TransposeFlip rows ↔ columnsAᵀ
Identity MatrixThe “1” of matrix landI = [[1,0],[0,1]]
InverseThe undo button (when it exists)A⁻¹A = I
EigenvalueHow much a direction stretches“Stretch factor”
EigenvectorDirection that doesn’t change“Preferred axis of chaos”

🧠 Pro Tip: If you see the term “orthogonal”, it usually means “thankfully independent.”


🔺 Calculus for Machine Learning

Derivatives You Should Pretend to Remember

FunctionDerivative
d/dx (x²)2x
d/dx (sin x)cos x
d/dx (eˣ)
d/dx (ln x)1/x

Chain Rule (The Grandparent of Backpropagation)

[ \frac{dL}{dx} = \frac{dL}{dy} \cdot \frac{dy}{dx} ]

Meaning: If you don’t know what you’re doing, just multiply derivatives until it works.


🎲 Probability Essentials

ConceptDefinitionExample
Probability (P)Likelihood of an eventP(heads)=0.5
**Conditional P(AB)**P of A if B already happened“P(it rainscloudy)”
Bayes’ RuleHow to update your opinion`P(AB) = P(BA)P(A)/P(B)`
Expectation (E[X])Weighted average outcome“Long-term mood”
Variance (Var[X])How moody it isE[(X−E[X])²]

🎰 Think of probability as math’s version of gambling — only with slightly fewer tears.


🧮 Optimization Refresher

ConceptMeaning
GradientDirection of steepest increase (which you promptly go opposite of)
Gradient DescentML’s favorite hiking method — down the hill of loss
Learning Rate (α)How big your steps are. Too small = nap, too big = disaster
Local MinimumThe valley where your model gets lazy
Global MinimumThe true lowest point (good luck finding it)

🏔️ Remember: Optimization is just calculus pretending to be adventurous.


🧾 Probability Distributions You’ll Actually Use

DistributionUse CaseExample
Normal (Gaussian)Errors, nature, life itselfN(μ, σ²)
BernoulliBinary events0 = no sale, 1 = sale
BinomialRepeated Bernoullis“How many customers buy?”
PoissonRare events“How often do people complain?”
ExponentialTime until next event“When will next customer churn?”

📊 If your data doesn’t fit a normal distribution — congratulations, you’re dealing with real business data.


🧠 Matrix Calculus Lite

Let’s make the scariest-sounding topic friendly:

[ y = Wx + b ] Then: [ \frac{∂y}{∂W} = xᵀ \quad \text{and} \quad \frac{∂y}{∂b} = 1 ]

Boom. You just did matrix calculus — and lived to tell the tale.


🪄 Cheat Formula Scroll

  • Mean: ( \mu = \frac{1}{n}\sum x_i )

  • Variance: ( \sigma^2 = \frac{1}{n}\sum (x_i - \mu)^2 )

  • Covariance: ( \text{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])] )

  • Correlation: ( \rho = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y} )

  • Gradient Descent Update: ( w := w - \alpha \nabla L(w) )

✍️ Write them on your wall, in your notebook, or tattoo them on your coffee mug.


🧘 Final Words

Math is the language that ML speaks — but you don’t need to be a poet to understand it. You just need to know enough to translate “loss went down” into “yay, my model learned something!”

☕ “Stay calm, take the derivative, and may your gradients always descend.”

# Your code here