Math Cheat-Sheets - Machine Learning for Business

“Because even data scientists occasionally forget what Σ means.”¶

🎯 Why This Exists¶

Let’s be honest — everyone says “I love math” until they meet matrix derivatives or eigenvalues. This cheat-sheet is your life jacket when swimming through ML math. It’s not meant to make you a mathematician — just to keep your neurons afloat long enough to train a model without crying.

💡 “Mathematics: turning coffee into theorems and stress into learning curves.”

📏 Linear Algebra in 60 Seconds¶

Concept	What It Really Means	Example
Vector	Fancy list of numbers	`[3, 4, 5]`
Matrix	Grid of numbers that bully your RAM	`[[1, 2], [3, 4]]`
Dot Product	“How similar are these two vibes?”	`A·B = Σ aᵢbᵢ`
Transpose	Flip rows ↔ columns	`Aᵀ`
Identity Matrix	The “1” of matrix land	`I = [[1,0],[0,1]]`
Inverse	The undo button (when it exists)	`A⁻¹A = I`
Eigenvalue	How much a direction stretches	“Stretch factor”
Eigenvector	Direction that doesn’t change	“Preferred axis of chaos”

🧠 Pro Tip: If you see the term “orthogonal”, it usually means “thankfully independent.”

🔺 Calculus for Machine Learning¶

Derivatives You Should Pretend to Remember¶

Function	Derivative
`d/dx (x²)`	`2x`
`d/dx (sin x)`	`cos x`
`d/dx (eˣ)`	`eˣ`
`d/dx (ln x)`	`1/x`

Chain Rule (The Grandparent of Backpropagation)¶

[ \frac{dL}{dx} = \frac{dL}{dy} \cdot \frac{dy}{dx} ]

Meaning: If you don’t know what you’re doing, just multiply derivatives until it works.

🎲 Probability Essentials¶

Concept	Definition	Example
Probability (P)	Likelihood of an event	`P(heads)=0.5`
**Conditional P(A	B)**	P of A if B already happened	“P(it rains	cloudy)”
Bayes’ Rule	How to update your opinion	`P(A	B) = P(B	A)P(A)/P(B)`
Expectation (E[X])	Weighted average outcome	“Long-term mood”
Variance (Var[X])	How moody it is	`E[(X−E[X])²]`

🎰 Think of probability as math’s version of gambling — only with slightly fewer tears.

🧮 Optimization Refresher¶

Concept	Meaning
Gradient	Direction of steepest increase (which you promptly go opposite of)
Gradient Descent	ML’s favorite hiking method — down the hill of loss
Learning Rate (α)	How big your steps are. Too small = nap, too big = disaster
Local Minimum	The valley where your model gets lazy
Global Minimum	The true lowest point (good luck finding it)

🏔️ Remember: Optimization is just calculus pretending to be adventurous.

🧾 Probability Distributions You’ll Actually Use¶

Distribution	Use Case	Example
Normal (Gaussian)	Errors, nature, life itself	`N(μ, σ²)`
Bernoulli	Binary events	0 = no sale, 1 = sale
Binomial	Repeated Bernoullis	“How many customers buy?”
Poisson	Rare events	“How often do people complain?”
Exponential	Time until next event	“When will next customer churn?”

📊 If your data doesn’t fit a normal distribution — congratulations, you’re dealing with real business data.

🧠 Matrix Calculus Lite¶

Let’s make the scariest-sounding topic friendly:

[ y = Wx + b ] Then: [ \frac{∂y}{∂W} = xᵀ \quad \text{and} \quad \frac{∂y}{∂b} = 1 ]

Boom. You just did matrix calculus — and lived to tell the tale.

🪄 Cheat Formula Scroll¶

Mean: ( \mu = \frac{1}{n}\sum x_i )
Variance: ( \sigma^2 = \frac{1}{n}\sum (x_i - \mu)^2 )
Covariance: ( \text{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])] )
Correlation: ( \rho = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y} )
Gradient Descent Update: ( w := w - \alpha \nabla L(w) )

✍️ Write them on your wall, in your notebook, or tattoo them on your coffee mug.

🧘 Final Words¶

Math is the language that ML speaks — but you don’t need to be a poet to understand it. You just need to know enough to translate “loss went down” into “yay, my model learned something!”

☕ “Stay calm, take the derivative, and may your gradients always descend.”

# Your code here