Linear Model Family#

Meet the Family: Simple, Multiple & Generalized Regression

“Every family has that one member who thinks they can explain everything with a straight line.” — Anonymous Data Scientist 😅

Welcome to the Linear Model Family, where:

  • Everyone loves straight lines,

  • Each cousin adds more variables,

  • And the distant uncle GLM shows up talking about log-odds at dinner.


👪 Meet the Family#

Let’s meet the key members of the Linear Model clan — one equation at a time.


🧍 Simple Linear Regression#

(The minimalist sibling)

[ \hat{y} = \beta_0 + \beta_1x ]

  • One input feature (x)

  • One output (y)

  • One slope, one intercept — simple and dramatic.

📊 Example: Predict sales from ad spend.

“Every extra dollar in advertising brings an extra $0.10 in sales.”

That’s it. Straightforward. No drama. Until marketing adds more variables…


👯‍♀️ Multiple Linear Regression#

(The overachiever sibling)

[ \hat{y} = \beta_0 + \beta_1x_1 + \beta_2x_2 + … + \beta_nx_n ]

When one variable isn’t enough, multiple regression joins the chat. Now we can model complex situations like:

📊 Example:

Predict sales using TV, radio, and social media ad spend.

Feature

Coefficient

Meaning

TV Spend

0.04

+\(0.04 in sales per \)1 spent

Radio Spend

0.08

+\(0.08 in sales per \)1 spent

Social Spend

0.01

“We’re trying…”

🎯 Business Translation: “TV ads sell, radio works, and social media gives us likes but not customers.”


🧙 Generalized Linear Models (GLM)#

(The mysterious uncle with equations and wine)

When your dependent variable isn’t continuous (like 0/1, counts, or categories), you need a model that can flex — enter GLM.

GLM extends the linear model by adding:

  1. A link function (to transform predictions)

  2. A distribution for the target variable

Examples include:

  • Logistic Regression (for binary outcomes)

  • Poisson Regression (for count data)

  • Gamma Regression (for skewed continuous data)

📊 Example:

Predict whether a customer will buy (1) or not (0) based on ad exposure.

GLM says: [ \text{logit}(p) = \beta_0 + \beta_1x_1 + \beta_2x_2 ]

Or in business English:

“Let’s use math to convert a yes/no question into something linear enough to make our computer happy.”


🧠 Concept Map#

Model Type

Equation

Target Type

Business Example

Simple Linear

( y = \beta_0 + \beta_1x )

Continuous

Predict sales from one ad channel

Multiple Linear

( y = \beta_0 + \sum \beta_i x_i )

Continuous

Predict revenue from multiple ad channels

Logistic (GLM)

( \text{logit}(p) = \beta_0 + \sum \beta_i x_i )

Binary

Predict if customer churns

Poisson (GLM)

( \log(\lambda) = \beta_0 + \sum \beta_i x_i )

Count

Predict number of calls to customer support


🎓 Business Analogy#

Think of regression models as chefs:

Model

Chef Personality

Kitchen Style

Simple Linear

Makes perfect grilled cheese

1 ingredient, 1 rule

Multiple Linear

Manages a buffet

Handles many dishes (features)

GLM

Fusion chef

Adjusts recipes (distributions) depending on the dish

“The recipe may change, but the secret sauce — linear thinking — stays the same.” 👨‍🍳


🧪 Practice Corner: “Predict the Profit” 💰#

You’re analyzing the following dataset:

TV

Radio

Social

Sales

100

50

20

15

200

60

25

25

300

80

30

35

Try this in your notebook:

import pandas as pd
from sklearn.linear_model import LinearRegression

df = pd.DataFrame({
    'TV': [100, 200, 300],
    'Radio': [50, 60, 80],
    'Social': [20, 25, 30],
    'Sales': [15, 25, 35]
})

X = df[['TV', 'Radio', 'Social']]
y = df['Sales']

model = LinearRegression().fit(X, y)
print(f"Coefficients: {model.coef_}")
print(f"Intercept: {model.intercept_}")

💬 Interpret it like a business pro: “For every \(1 spent on TV ads, sales increase by \)0.05 — but we might be overspending on radio.” 🎯


🧮 Math Snack: Correlation ≠ Causation#

Regression models capture relationships, not reasons. If ice cream sales correlate with shark attacks, it doesn’t mean sharks love dessert. 🦈🍦

Always mix regression with business logic — not blind trust.


🧭 Recap#

Concept

Description

Simple Regression

One variable, one prediction

Multiple Regression

Many variables, one outcome

GLM

Regression for non-continuous targets

Coefficients

Measure effect of each variable

Intercept

The baseline prediction

Assumptions

Linearity, independence, normal errors


💬 Final Thought#

“Linear models are like spreadsheets — simple, powerful, and everywhere. You just need to know which cells to fill.” 📊


🔜 Next Up#

👉 Head to Mean Squared Error — where we learn how to measure prediction pain and teach our models to feel regret mathematically. 🧠💔

“Because every good model needs to know how wrong it was — politely, of course.” 😅


Recall that a linear model has the form \begin{align*} y & = \theta_0 + \theta_1 \cdot x_1 + \theta_2 \cdot x_2 + … + \theta_d \cdot x_d \end{align*} where \(x \in \mathbb{R}^d\) is a vector of features and \(y\) is the target. The \(\theta_j\) are the parameters of the model.

Linear regression will find a straight line that will try to best fit the data provided. It does so by learning the slope of the line, and the bais term (y-intercept)

Given a table:

size of house(int sq. ft) (x)

price in $1000(y)

450

100

324

78

844

123

Our hypothesis (prediction) is: $\(h_\theta(x) = \theta_0 + \theta_1x\)\( Will give us an equation of line that will predict the price. The above equation is nothing but the equation of line. __When we say the machine learns, we are actually adjusting the parameters \)\theta_0\( and \)\theta_1\(__. So for a new x (size of house) we will insert the value of x in the above equation and produce a value \)\hat y$ (our prediction)

Below is a Python script that plots the equation \(y = mx + c\) using the provided data points and demonstrates how this equation relates to the linear model in the form of \(\theta\). The script first plots the data points and a best-fit line calculated using linear regression, then explains the connection between \(y = mx + c\) and the vectorized form \(h_\theta(x) = \theta^\top x\).

Explanation#

  • Plotting \(y = mx + c\): The script uses the data points provided (| Size of house (x) | Price (y) |) to compute the slope \(m\) (equivalent to \(\theta_1\)) and intercept \(c\) (equivalent to \(\theta_0\)) via linear regression. It then plots these points and the line \(y = mx + c\) using Matplotlib.

  • Relation to \(\theta\):

    • The linear equation \(y = mx + c\) is a specific case of the linear model \(h_\theta(x) = \theta_0 + \theta_1 x\), where:

      • \(c = \theta_0\) (the y-intercept or bias term),

      • \(m = \theta_1\) (the slope or weight of the feature \(x\)).

    • By defining \(x_0 = 1\) as a constant feature, we can extend the input \(x\) to a vector \([1, x]\), and the parameters to a vector \(\theta = [\theta_0, \theta_1]\).

    • The model then becomes \(h_\theta(x) = \theta^\top x = \theta_0 \cdot 1 + \theta_1 \cdot x\), which is mathematically equivalent to \(y = mx + c\).

    • This vectorized form \(\theta^\top x\) is commonly used in machine learning to generalize the model to multiple features, e.g., \(h_\theta(x) = \theta_0 + \theta_1 x_1 + \cdots + \theta_d x_d\) for \(d\) features.

  • Computation: The script calculates \(\theta\) using the normal equation, ensuring the line minimizes the Mean Squared Error (MSE) for the given data. The resulting \(\theta_0\) and \(\theta_1\) are printed and used to plot the line.

This demonstrates both the plotting of \(y = mx + c\) and its representation in the \(\theta\)-based notation of linear regression.

import numpy as np
import matplotlib.pyplot as plt

# Data points from the table
x = np.array([450, 324, 844])  # Size of house in sq. ft
y = np.array([100, 78, 123])   # Price in $1000

# Construct the design matrix X with bias term (x_0 = 1)
X = np.vstack([np.ones(len(x)), x]).T  # Shape: (3, 2), where each row is [1, x_i]

# Compute optimal parameters theta using the normal equation: theta = (X^T X)^(-1) X^T y
XtX = np.dot(X.T, X)
XtX_inv = np.linalg.inv(XtX)
Xty = np.dot(X.T, y)
theta = np.dot(XtX_inv, Xty)
theta_0, theta_1 = theta  # theta_0 is the intercept (c), theta_1 is the slope (m)

# Print the parameters
print(f"Parameters in theta form: theta_0 (intercept) = {theta_0:.2f}, theta_1 (slope) = {theta_1:.2f}")
print(f"In y = mx + c form: c = {theta_0:.2f}, m = {theta_1:.2f}")
print("The model can be written as:")
print(f"  h_theta(x) = {theta_0:.2f} + {theta_1:.2f} * x")
print("Or in vectorized form: h_theta(x) = theta^T x, where theta = [theta_0, theta_1] and x = [1, x]")

# Generate points for the best-fit line
x_line = np.linspace(300, 900, 100)  # Range covering the data points
y_line = theta_0 + theta_1 * x_line  # y = mx + c using computed theta_0 and theta_1

# Plot the data points and the best-fit line
plt.scatter(x, y, color='red', label='Data points')
plt.plot(x_line, y_line, color='blue', label=f'y = {theta_1:.2f}x + {theta_0:.2f}')
plt.xlabel('Size of house (sq. ft)')
plt.ylabel('Price in $1000')
plt.title('House Price vs Size with Best-Fit Line')
plt.legend()
plt.grid(True)
plt.show()
Parameters in theta form: theta_0 (intercept) = 57.29, theta_1 (slope) = 0.08
In y = mx + c form: c = 57.29, m = 0.08
The model can be written as:
  h_theta(x) = 57.29 + 0.08 * x
Or in vectorized form: h_theta(x) = theta^T x, where theta = [theta_0, theta_1] and x = [1, x]
_images/f2131c8318cdd2d4bb17ca43db52ee65f467a89ad458d399981e71ced7936d3c.png
# Your code here