Ordinary Least Squares (OLS) & Normal Equations

Ordinary Least Squares (OLS) & Normal Equations#

Because sometimes, math can solve your problem faster than hiking down a loss valley. 😌

🎯 What Is OLS?#

So far, we’ve made our poor regression model stumble downhill with gradients, slowly minimizing error. But what if we could just jump straight to the bottom of the valley — no hiking boots required? 👟

That’s Ordinary Least Squares (OLS).

It’s the “shortcut” method that says:

“We can find the perfect slope and intercept directly — with pure algebra.”

🧠 The Idea#

OLS minimizes the same thing Gradient Descent does — Mean Squared Error (MSE):

[ J(\beta) = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 ]

But instead of learning gradually, we solve for β directly by setting derivatives to zero. Basically, we say:

“We want the slope where error stops changing.”

That gives us the Normal Equation:

[ \hat{\beta} = (X^T X)^{-1} X^T y ]

Where:

( X ): matrix of features (with a column of ones for intercept)
( y ): target variable
( \hat{\beta} ): the optimal coefficients

🧾 Why “Normal” Equation?#

Because it comes from setting the derivative of the loss function to zero (a “normal” condition for optimality).

Or, as a data scientist might explain to an executive:

“Normal equations make your regression weights perfectly balanced — like all things should be.” 😎

🧮 A Simple Example#

Suppose we’re modeling:

Sales = β₀ + β₁ × TV_Ad_Spend

import numpy as np

# Example data
X = np.array([[1, 230],
              [1, 44],
              [1, 17],
              [1, 151],
              [1, 180]])  # add column of 1s for intercept

y = np.array([22, 10, 7, 18, 20])

# OLS via Normal Equation
beta = np.linalg.inv(X.T @ X) @ X.T @ y
print("Coefficients:", beta)

This gives:

Coefficients: [7.03, 0.07]

Meaning:

Even with $0 ad spend, you get baseline sales ≈ 7.03 Every additional $1 in TV ads adds about $0.07 in sales. 📺💰

🧩 Intuition: A Line that Minimizes Apologies#

Think of OLS as drawing the “least embarrassing line” through your scatter plot:

For each point, the vertical error (residual) is the model’s mistake.
OLS finds the line that makes the sum of squared mistakes as small as possible.

No gradient descent, no random initialization, no drama. Just straight math, no feelings. 🧘‍♀️

⚙️ OLS in Scikit-Learn#

In practice, you rarely invert matrices yourself. You can let scikit-learn handle it faster (and with numerical stability):

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X[:, 1].reshape(-1, 1), y)

print("Intercept:", model.intercept_)
print("Slope:", model.coef_)

📊 Visualising the Fit#

import matplotlib.pyplot as plt

plt.scatter(X[:, 1], y, label="Actual Sales", alpha=0.8)
plt.plot(X[:, 1], model.predict(X[:, 1].reshape(-1, 1)),
         color="red", label="OLS Fit Line", linewidth=2)
plt.xlabel("TV Advertising Spend ($)")
plt.ylabel("Sales ($)")
plt.title("OLS Regression Line – Sales vs TV Spend")
plt.legend()
plt.show()

“When your scatter plot looks like a calm red line — that’s when business harmony is achieved.” 📈☯️

🧮 Matrix Shapes (Quick Reference)#

Symbol	Meaning	Shape
( X )	Feature matrix	(n_samples, n_features + 1)
( y )	Target vector	(n_samples, 1)
( X^T X )	Square matrix	(n_features + 1, n_features + 1)
( (X^T X)^{-1} )	Inverse	(n_features + 1, n_features + 1)
( \hat{\beta} )	Coefficient vector	(n_features + 1, 1)

“If your matrix dimensions don’t align, your model’s chakras are blocked.” 🧘‍♂️

⚠️ Limitations of OLS#

Limitation	Description
💻 Computational Cost	Matrix inversion is expensive for large data
💥 Multicollinearity	( X^T X ) can become singular (not invertible)
📏 Assumes Linearity	Works only for linear relationships
📊 Sensitive to Outliers	One crazy data point can tilt your whole model

💡 Business Analogy#

OLS is like a consultant who instantly gives you the “best-fit” solution — but charges extra if your data is messy or high-dimensional. 💼

Gradient descent, on the other hand, is like an intern who learns slowly but can handle huge data cheaply. 🧑‍💻

📚 Tip for Python Learners#

If you’re new to Python or NumPy matrix operations, check out my companion book: 👉 Programming for Business It’s like “Python Gym” before you lift machine learning weights. 🏋️‍♂️🐍

🧭 Recap#

Concept	Description
OLS	Analytical method to minimize squared errors
Normal Equation	Closed-form solution for regression weights
Advantage	No iterative training needed
Disadvantage	Computationally heavy for large datasets
Relation to Gradient Descent	Both minimize same cost — different paths

💬 Final Thought#

“OLS doesn’t learn — it knows. Like that one kid in class who never studied but still topped the exam.” 😏📚

🔜 Next Up#

👉 Head to Non-linear & Polynomial Features where we’ll make our linear models curvy and flexible — because business problems rarely run in straight lines. 📈🔀

# Your code here