Math Cheat-Sheets#

“Because even data scientists occasionally forget what Σ means.”#


🎯 Why This Exists#

Let’s be honest — everyone says “I love math” until they meet matrix derivatives or eigenvalues. This cheat-sheet is your life jacket when swimming through ML math. It’s not meant to make you a mathematician — just to keep your neurons afloat long enough to train a model without crying.

💡 “Mathematics: turning coffee into theorems and stress into learning curves.”


📏 Linear Algebra in 60 Seconds#

Concept

What It Really Means

Example

Vector

Fancy list of numbers

[3, 4, 5]

Matrix

Grid of numbers that bully your RAM

[[1, 2], [3, 4]]

Dot Product

“How similar are these two vibes?”

A·B = Σ aᵢbᵢ

Transpose

Flip rows ↔ columns

Aᵀ

Identity Matrix

The “1” of matrix land

I = [[1,0],[0,1]]

Inverse

The undo button (when it exists)

A⁻¹A = I

Eigenvalue

How much a direction stretches

“Stretch factor”

Eigenvector

Direction that doesn’t change

“Preferred axis of chaos”

🧠 Pro Tip: If you see the term “orthogonal”, it usually means “thankfully independent.”


🔺 Calculus for Machine Learning#

Derivatives You Should Pretend to Remember#

Function

Derivative

d/dx (x²)

2x

d/dx (sin x)

cos x

d/dx (eˣ)

d/dx (ln x)

1/x

Chain Rule (The Grandparent of Backpropagation)#

[ \frac{dL}{dx} = \frac{dL}{dy} \cdot \frac{dy}{dx} ]

Meaning: If you don’t know what you’re doing, just multiply derivatives until it works.


🎲 Probability Essentials#

Concept

Definition

Example

Probability (P)

Likelihood of an event

P(heads)=0.5

**Conditional P(A

B)**

P of A if B already happened

“P(it rains

cloudy)”

Bayes’ Rule

How to update your opinion

`P(A

B) = P(B

A)P(A)/P(B)`

Expectation (E[X])

Weighted average outcome

“Long-term mood”

Variance (Var[X])

How moody it is

E[(X−E[X])²]

🎰 Think of probability as math’s version of gambling — only with slightly fewer tears.


🧮 Optimization Refresher#

Concept

Meaning

Gradient

Direction of steepest increase (which you promptly go opposite of)

Gradient Descent

ML’s favorite hiking method — down the hill of loss

Learning Rate (α)

How big your steps are. Too small = nap, too big = disaster

Local Minimum

The valley where your model gets lazy

Global Minimum

The true lowest point (good luck finding it)

🏔️ Remember: Optimization is just calculus pretending to be adventurous.


🧾 Probability Distributions You’ll Actually Use#

Distribution

Use Case

Example

Normal (Gaussian)

Errors, nature, life itself

N(μ, σ²)

Bernoulli

Binary events

0 = no sale, 1 = sale

Binomial

Repeated Bernoullis

“How many customers buy?”

Poisson

Rare events

“How often do people complain?”

Exponential

Time until next event

“When will next customer churn?”

📊 If your data doesn’t fit a normal distribution — congratulations, you’re dealing with real business data.


🧠 Matrix Calculus Lite#

Let’s make the scariest-sounding topic friendly:

[ y = Wx + b ] Then: [ \frac{∂y}{∂W} = xᵀ \quad \text{and} \quad \frac{∂y}{∂b} = 1 ]

Boom. You just did matrix calculus — and lived to tell the tale.


🪄 Cheat Formula Scroll#

  • Mean: ( \mu = \frac{1}{n}\sum x_i )

  • Variance: ( \sigma^2 = \frac{1}{n}\sum (x_i - \mu)^2 )

  • Covariance: ( \text{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])] )

  • Correlation: ( \rho = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y} )

  • Gradient Descent Update: ( w := w - \alpha \nabla L(w) )

✍️ Write them on your wall, in your notebook, or tattoo them on your coffee mug.


🧘 Final Words#

Math is the language that ML speaks — but you don’t need to be a poet to understand it. You just need to know enough to translate “loss went down” into “yay, my model learned something!”

☕ “Stay calm, take the derivative, and may your gradients always descend.”

# Your code here