Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Meet the Family: Simple, Multiple & Generalized Regression

“Every family has that one member who thinks they can explain everything with a straight line.” — Anonymous Data Scientist 😅

Welcome to the Linear Model Family, where:

  • Everyone loves straight lines,

  • Each cousin adds more variables,

  • And the distant uncle GLM shows up talking about log-odds at dinner.


👪 Meet the Family

Let’s meet the key members of the Linear Model clan — one equation at a time.


🧍 Simple Linear Regression

(The minimalist sibling)

[ \hat{y} = \beta_0 + \beta_1x ]

  • One input feature (x)

  • One output (y)

  • One slope, one intercept — simple and dramatic.

📊 Example: Predict sales from ad spend.

“Every extra dollar in advertising brings an extra $0.10 in sales.”

That’s it. Straightforward. No drama. Until marketing adds more variables…


👯‍♀️ Multiple Linear Regression

(The overachiever sibling)

[ \hat{y} = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_nx_n ]

When one variable isn’t enough, multiple regression joins the chat. Now we can model complex situations like:

📊 Example:

Predict sales using TV, radio, and social media ad spend.

FeatureCoefficientMeaning
TV Spend0.04+0.04insalesper0.04 in sales per 1 spent
Radio Spend0.08+0.08insalesper0.08 in sales per 1 spent
Social Spend0.01“We’re trying…”

🎯 Business Translation: “TV ads sell, radio works, and social media gives us likes but not customers.”


🧙 Generalized Linear Models (GLM)

(The mysterious uncle with equations and wine)

When your dependent variable isn’t continuous (like 0/1, counts, or categories), you need a model that can flex — enter GLM.

GLM extends the linear model by adding:

  1. A link function (to transform predictions)

  2. A distribution for the target variable

Examples include:

  • Logistic Regression (for binary outcomes)

  • Poisson Regression (for count data)

  • Gamma Regression (for skewed continuous data)

📊 Example:

Predict whether a customer will buy (1) or not (0) based on ad exposure.

GLM says: [ \text{logit}(p) = \beta_0 + \beta_1x_1 + \beta_2x_2 ]

Or in business English:

“Let’s use math to convert a yes/no question into something linear enough to make our computer happy.”


🧠 Concept Map

Model TypeEquationTarget TypeBusiness Example
Simple Linear( y = \beta_0 + \beta_1x )ContinuousPredict sales from one ad channel
Multiple Linear( y = \beta_0 + \sum \beta_i x_i )ContinuousPredict revenue from multiple ad channels
Logistic (GLM)( \text{logit}(p) = \beta_0 + \sum \beta_i x_i )BinaryPredict if customer churns
Poisson (GLM)( \log(\lambda) = \beta_0 + \sum \beta_i x_i )CountPredict number of calls to customer support

🎓 Business Analogy

Think of regression models as chefs:

ModelChef PersonalityKitchen Style
Simple LinearMakes perfect grilled cheese1 ingredient, 1 rule
Multiple LinearManages a buffetHandles many dishes (features)
GLMFusion chefAdjusts recipes (distributions) depending on the dish

“The recipe may change, but the secret sauce — linear thinking — stays the same.” 👨‍🍳


🧪 Practice Corner: “Predict the Profit” 💰

You’re analyzing the following dataset:

TVRadioSocialSales
100502015
200602525
300803035

Try this in your notebook:

import pandas as pd
from sklearn.linear_model import LinearRegression

df = pd.DataFrame({
    'TV': [100, 200, 300],
    'Radio': [50, 60, 80],
    'Social': [20, 25, 30],
    'Sales': [15, 25, 35]
})

X = df[['TV', 'Radio', 'Social']]
y = df['Sales']

model = LinearRegression().fit(X, y)
print(f"Coefficients: {model.coef_}")
print(f"Intercept: {model.intercept_}")

💬 Interpret it like a business pro: “For every 1spentonTVads,salesincreaseby1 spent on TV ads, sales increase by 0.05 — but we might be overspending on radio.” 🎯


🧮 Math Snack: Correlation ≠ Causation

Regression models capture relationships, not reasons. If ice cream sales correlate with shark attacks, it doesn’t mean sharks love dessert. 🦈🍦

Always mix regression with business logic — not blind trust.


🧭 Recap

ConceptDescription
Simple RegressionOne variable, one prediction
Multiple RegressionMany variables, one outcome
GLMRegression for non-continuous targets
CoefficientsMeasure effect of each variable
InterceptThe baseline prediction
AssumptionsLinearity, independence, normal errors

💬 Final Thought

“Linear models are like spreadsheets — simple, powerful, and everywhere. You just need to know which cells to fill.” 📊


🔜 Next Up

👉 Head to Mean Squared Error — where we learn how to measure prediction pain and teach our models to feel regret mathematically. 🧠💔

“Because every good model needs to know how wrong it was — politely, of course.” 😅


Recall that a linear model has the form

y=θ0+θ1x1+θ2x2+...+θdxd\begin{align*} y & = \theta_0 + \theta_1 \cdot x_1 + \theta_2 \cdot x_2 + ... + \theta_d \cdot x_d \end{align*}

where xRdx \in \mathbb{R}^d is a vector of features and yy is the target. The θj\theta_j are the parameters of the model.

Linear regression will find a straight line that will try to best fit the data provided. It does so by learning the slope of the line, and the bais term (y-intercept)

Given a table:

size of house(int sq. ft) (x)price in $1000(y)
450100
32478
844123

Our hypothesis (prediction) is:

hθ(x)=θ0+θ1xh_\theta(x) = \theta_0 + \theta_1x

Will give us an equation of line that will predict the price. The above equation is nothing but the equation of line. When we say the machine learns, we are actually adjusting the parameters θ0\theta_0 and θ1\theta_1. So for a new x (size of house) we will insert the value of x in the above equation and produce a value y^\hat y (our prediction)

Below is a Python script that plots the equation y=mx+cy = mx + c using the provided data points and demonstrates how this equation relates to the linear model in the form of θ\theta. The script first plots the data points and a best-fit line calculated using linear regression, then explains the connection between y=mx+cy = mx + c and the vectorized form hθ(x)=θxh_\theta(x) = \theta^\top x.

Explanation

  • Plotting y=mx+cy = mx + c: The script uses the data points provided (| Size of house (x) | Price (y) |) to compute the slope mm (equivalent to θ1\theta_1) and intercept cc (equivalent to θ0\theta_0) via linear regression. It then plots these points and the line y=mx+cy = mx + c using Matplotlib.

  • Relation to θ\theta:

    • The linear equation y=mx+cy = mx + c is a specific case of the linear model hθ(x)=θ0+θ1xh_\theta(x) = \theta_0 + \theta_1 x, where:

      • c=θ0c = \theta_0 (the y-intercept or bias term),

      • m=θ1m = \theta_1 (the slope or weight of the feature xx).

    • By defining x0=1x_0 = 1 as a constant feature, we can extend the input xx to a vector [1,x][1, x], and the parameters to a vector θ=[θ0,θ1]\theta = [\theta_0, \theta_1].

    • The model then becomes hθ(x)=θx=θ01+θ1xh_\theta(x) = \theta^\top x = \theta_0 \cdot 1 + \theta_1 \cdot x, which is mathematically equivalent to y=mx+cy = mx + c.

    • This vectorized form θx\theta^\top x is commonly used in machine learning to generalize the model to multiple features, e.g., hθ(x)=θ0+θ1x1++θdxdh_\theta(x) = \theta_0 + \theta_1 x_1 + \cdots + \theta_d x_d for dd features.

  • Computation: The script calculates θ\theta using the normal equation, ensuring the line minimizes the Mean Squared Error (MSE) for the given data. The resulting θ0\theta_0 and θ1\theta_1 are printed and used to plot the line.

This demonstrates both the plotting of y=mx+cy = mx + c and its representation in the θ\theta-based notation of linear regression.

import numpy as np
import matplotlib.pyplot as plt

# Data points from the table
x = np.array([450, 324, 844])  # Size of house in sq. ft
y = np.array([100, 78, 123])   # Price in $1000

# Construct the design matrix X with bias term (x_0 = 1)
X = np.vstack([np.ones(len(x)), x]).T  # Shape: (3, 2), where each row is [1, x_i]

# Compute optimal parameters theta using the normal equation: theta = (X^T X)^(-1) X^T y
XtX = np.dot(X.T, X)
XtX_inv = np.linalg.inv(XtX)
Xty = np.dot(X.T, y)
theta = np.dot(XtX_inv, Xty)
theta_0, theta_1 = theta  # theta_0 is the intercept (c), theta_1 is the slope (m)

# Print the parameters
print(f"Parameters in theta form: theta_0 (intercept) = {theta_0:.2f}, theta_1 (slope) = {theta_1:.2f}")
print(f"In y = mx + c form: c = {theta_0:.2f}, m = {theta_1:.2f}")
print("The model can be written as:")
print(f"  h_theta(x) = {theta_0:.2f} + {theta_1:.2f} * x")
print("Or in vectorized form: h_theta(x) = theta^T x, where theta = [theta_0, theta_1] and x = [1, x]")

# Generate points for the best-fit line
x_line = np.linspace(300, 900, 100)  # Range covering the data points
y_line = theta_0 + theta_1 * x_line  # y = mx + c using computed theta_0 and theta_1

# Plot the data points and the best-fit line
plt.scatter(x, y, color='red', label='Data points')
plt.plot(x_line, y_line, color='blue', label=f'y = {theta_1:.2f}x + {theta_0:.2f}')
plt.xlabel('Size of house (sq. ft)')
plt.ylabel('Price in $1000')
plt.title('House Price vs Size with Best-Fit Line')
plt.legend()
plt.grid(True)
plt.show()
Parameters in theta form: theta_0 (intercept) = 57.29, theta_1 (slope) = 0.08
In y = mx + c form: c = 57.29, m = 0.08
The model can be written as:
  h_theta(x) = 57.29 + 0.08 * x
Or in vectorized form: h_theta(x) = theta^T x, where theta = [theta_0, theta_1] and x = [1, x]
<Figure size 640x480 with 1 Axes>
# Your code here