Math Cheat-Sheets#
“Because even data scientists occasionally forget what Σ means.”#
🎯 Why This Exists#
Let’s be honest — everyone says “I love math” until they meet matrix derivatives or eigenvalues. This cheat-sheet is your life jacket when swimming through ML math. It’s not meant to make you a mathematician — just to keep your neurons afloat long enough to train a model without crying.
💡 “Mathematics: turning coffee into theorems and stress into learning curves.”
📏 Linear Algebra in 60 Seconds#
Concept |
What It Really Means |
Example |
|---|---|---|
Vector |
Fancy list of numbers |
|
Matrix |
Grid of numbers that bully your RAM |
|
Dot Product |
“How similar are these two vibes?” |
|
Transpose |
Flip rows ↔ columns |
|
Identity Matrix |
The “1” of matrix land |
|
Inverse |
The undo button (when it exists) |
|
Eigenvalue |
How much a direction stretches |
“Stretch factor” |
Eigenvector |
Direction that doesn’t change |
“Preferred axis of chaos” |
🧠 Pro Tip: If you see the term “orthogonal”, it usually means “thankfully independent.”
🔺 Calculus for Machine Learning#
Derivatives You Should Pretend to Remember#
Function |
Derivative |
|---|---|
|
|
|
|
|
|
|
|
Chain Rule (The Grandparent of Backpropagation)#
[ \frac{dL}{dx} = \frac{dL}{dy} \cdot \frac{dy}{dx} ]
Meaning: If you don’t know what you’re doing, just multiply derivatives until it works.
🎲 Probability Essentials#
Concept |
Definition |
Example |
||
|---|---|---|---|---|
Probability (P) |
Likelihood of an event |
|
||
**Conditional P(A |
B)** |
P of A if B already happened |
“P(it rains |
cloudy)” |
Bayes’ Rule |
How to update your opinion |
`P(A |
B) = P(B |
A)P(A)/P(B)` |
Expectation (E[X]) |
Weighted average outcome |
“Long-term mood” |
||
Variance (Var[X]) |
How moody it is |
|
🎰 Think of probability as math’s version of gambling — only with slightly fewer tears.
🧮 Optimization Refresher#
Concept |
Meaning |
|---|---|
Gradient |
Direction of steepest increase (which you promptly go opposite of) |
Gradient Descent |
ML’s favorite hiking method — down the hill of loss |
Learning Rate (α) |
How big your steps are. Too small = nap, too big = disaster |
Local Minimum |
The valley where your model gets lazy |
Global Minimum |
The true lowest point (good luck finding it) |
🏔️ Remember: Optimization is just calculus pretending to be adventurous.
🧾 Probability Distributions You’ll Actually Use#
Distribution |
Use Case |
Example |
|---|---|---|
Normal (Gaussian) |
Errors, nature, life itself |
|
Bernoulli |
Binary events |
0 = no sale, 1 = sale |
Binomial |
Repeated Bernoullis |
“How many customers buy?” |
Poisson |
Rare events |
“How often do people complain?” |
Exponential |
Time until next event |
“When will next customer churn?” |
📊 If your data doesn’t fit a normal distribution — congratulations, you’re dealing with real business data.
🧠 Matrix Calculus Lite#
Let’s make the scariest-sounding topic friendly:
[ y = Wx + b ] Then: [ \frac{∂y}{∂W} = xᵀ \quad \text{and} \quad \frac{∂y}{∂b} = 1 ]
Boom. You just did matrix calculus — and lived to tell the tale.
🪄 Cheat Formula Scroll#
Mean: ( \mu = \frac{1}{n}\sum x_i )
Variance: ( \sigma^2 = \frac{1}{n}\sum (x_i - \mu)^2 )
Covariance: ( \text{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])] )
Correlation: ( \rho = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y} )
Gradient Descent Update: ( w := w - \alpha \nabla L(w) )
✍️ Write them on your wall, in your notebook, or tattoo them on your coffee mug.
🧘 Final Words#
Math is the language that ML speaks — but you don’t need to be a poet to understand it. You just need to know enough to translate “loss went down” into “yay, my model learned something!”
☕ “Stay calm, take the derivative, and may your gradients always descend.”
# Your code here