Linear Model Family - Machine Learning for Business

Meet the Family: Simple, Multiple & Generalized Regression

“Every family has that one member who thinks they can explain everything with a straight line.” — Anonymous Data Scientist 😅

Welcome to the Linear Model Family, where:

Everyone loves straight lines,
Each cousin adds more variables,
And the distant uncle GLM shows up talking about log-odds at dinner.

👪 Meet the Family¶

Let’s meet the key members of the Linear Model clan — one equation at a time.

🧍 Simple Linear Regression¶

(The minimalist sibling)

[ \hat{y} = \beta_0 + \beta_1x ]

One input feature (x)
One output (y)
One slope, one intercept — simple and dramatic.

📊 Example: Predict sales from ad spend.

“Every extra dollar in advertising brings an extra $0.10 in sales.”

That’s it. Straightforward. No drama. Until marketing adds more variables…

👯‍♀️ Multiple Linear Regression¶

(The overachiever sibling)

[ \hat{y} = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_nx_n ]

When one variable isn’t enough, multiple regression joins the chat. Now we can model complex situations like:

📊 Example:

Predict sales using TV, radio, and social media ad spend.

Feature	Coefficient	Meaning
TV Spend	0.04	+ $0.04 in sales per$ 1 spent
Radio Spend	0.08	+ $0.08 in sales per$ 1 spent
Social Spend	0.01	“We’re trying…”

🎯 Business Translation: “TV ads sell, radio works, and social media gives us likes but not customers.”

🧙 Generalized Linear Models (GLM)¶

(The mysterious uncle with equations and wine)

When your dependent variable isn’t continuous (like 0/1, counts, or categories), you need a model that can flex — enter GLM.

GLM extends the linear model by adding:

A link function (to transform predictions)
A distribution for the target variable

Examples include:

Logistic Regression (for binary outcomes)
Poisson Regression (for count data)
Gamma Regression (for skewed continuous data)

📊 Example:

Predict whether a customer will buy (1) or not (0) based on ad exposure.

GLM says: [ \text{logit}(p) = \beta_0 + \beta_1x_1 + \beta_2x_2 ]

Or in business English:

“Let’s use math to convert a yes/no question into something linear enough to make our computer happy.”

🧠 Concept Map¶

Model Type	Equation	Target Type	Business Example
Simple Linear	( y = \beta_0 + \beta_1x )	Continuous	Predict sales from one ad channel
Multiple Linear	( y = \beta_0 + \sum \beta_i x_i )	Continuous	Predict revenue from multiple ad channels
Logistic (GLM)	( \text{logit}(p) = \beta_0 + \sum \beta_i x_i )	Binary	Predict if customer churns
Poisson (GLM)	( \log(\lambda) = \beta_0 + \sum \beta_i x_i )	Count	Predict number of calls to customer support

🎓 Business Analogy¶

Think of regression models as chefs:

Model	Chef Personality	Kitchen Style
Simple Linear	Makes perfect grilled cheese	1 ingredient, 1 rule
Multiple Linear	Manages a buffet	Handles many dishes (features)
GLM	Fusion chef	Adjusts recipes (distributions) depending on the dish

“The recipe may change, but the secret sauce — linear thinking — stays the same.” 👨‍🍳

🧪 Practice Corner: “Predict the Profit” 💰¶

You’re analyzing the following dataset:

TV	Radio	Social	Sales
100	50	20	15
200	60	25	25
300	80	30	35

Try this in your notebook:

import pandas as pd
from sklearn.linear_model import LinearRegression

df = pd.DataFrame({
    'TV': [100, 200, 300],
    'Radio': [50, 60, 80],
    'Social': [20, 25, 30],
    'Sales': [15, 25, 35]
})

X = df[['TV', 'Radio', 'Social']]
y = df['Sales']

model = LinearRegression().fit(X, y)
print(f"Coefficients: {model.coef_}")
print(f"Intercept: {model.intercept_}")

💬 Interpret it like a business pro: “For every $1 spent on TV ads, sales increase by$ 0.05 — but we might be overspending on radio.” 🎯

🧮 Math Snack: Correlation ≠ Causation¶

Regression models capture relationships, not reasons. If ice cream sales correlate with shark attacks, it doesn’t mean sharks love dessert. 🦈🍦

Always mix regression with business logic — not blind trust.

🧭 Recap¶

Concept	Description
Simple Regression	One variable, one prediction
Multiple Regression	Many variables, one outcome
GLM	Regression for non-continuous targets
Coefficients	Measure effect of each variable
Intercept	The baseline prediction
Assumptions	Linearity, independence, normal errors

💬 Final Thought¶

“Linear models are like spreadsheets — simple, powerful, and everywhere. You just need to know which cells to fill.” 📊

🔜 Next Up¶

👉 Head to Mean Squared Error — where we learn how to measure prediction pain and teach our models to feel regret mathematically. 🧠💔

“Because every good model needs to know how wrong it was — politely, of course.” 😅

Recall that a linear model has the form

\begin{align*} y & = \theta_0 + \theta_1 \cdot x_1 + \theta_2 \cdot x_2 + ... + \theta_d \cdot x_d \end{align*}

(1)

where $x \in \mathbb{R}^d$ is a vector of features and $y$ is the target. The $\theta_j$ are the parameters of the model.

Linear regression will find a straight line that will try to best fit the data provided. It does so by learning the slope of the line, and the bais term (y-intercept)

Given a table:

size of house(int sq. ft) (x)	price in $1000(y)
450	100
324	78
844	123

Our hypothesis (prediction) is:

h_\theta(x) = \theta_0 + \theta_1x

(2)

Will give us an equation of line that will predict the price. The above equation is nothing but the equation of line. When we say the machine learns, we are actually adjusting the parameters $\theta_0$ and $\theta_1$ . So for a new x (size of house) we will insert the value of x in the above equation and produce a value $\hat y$ (our prediction)

Below is a Python script that plots the equation $y = mx + c$ using the provided data points and demonstrates how this equation relates to the linear model in the form of $\theta$ . The script first plots the data points and a best-fit line calculated using linear regression, then explains the connection between $y = mx + c$ and the vectorized form $h_\theta(x) = \theta^\top x$ .

Explanation¶

Plotting $y = mx + c$ : The script uses the data points provided (| Size of house (x) | Price (y) |) to compute the slope $m$ (equivalent to $\theta_1$ ) and intercept $c$ (equivalent to $\theta_0$ ) via linear regression. It then plots these points and the line $y = mx + c$ using Matplotlib.
Relation to $\theta$ :
- The linear equation $y = mx + c$ is a specific case of the linear model $h_\theta(x) = \theta_0 + \theta_1 x$ , where:
  - $c = \theta_0$ (the y-intercept or bias term),
  - $m = \theta_1$ (the slope or weight of the feature $x$ ).
- By defining $x_0 = 1$ as a constant feature, we can extend the input $x$ to a vector $[1, x]$ , and the parameters to a vector $\theta = [\theta_0, \theta_1]$ .
- The model then becomes $h_\theta(x) = \theta^\top x = \theta_0 \cdot 1 + \theta_1 \cdot x$ , which is mathematically equivalent to $y = mx + c$ .
- This vectorized form $\theta^\top x$ is commonly used in machine learning to generalize the model to multiple features, e.g., $h_\theta(x) = \theta_0 + \theta_1 x_1 + \cdots + \theta_d x_d$ for $d$ features.
Computation: The script calculates $\theta$ using the normal equation, ensuring the line minimizes the Mean Squared Error (MSE) for the given data. The resulting $\theta_0$ and $\theta_1$ are printed and used to plot the line.

This demonstrates both the plotting of $y = mx + c$ and its representation in the $\theta$ -based notation of linear regression.

import numpy as np
import matplotlib.pyplot as plt

# Data points from the table
x = np.array([450, 324, 844])  # Size of house in sq. ft
y = np.array([100, 78, 123])   # Price in $1000

# Construct the design matrix X with bias term (x_0 = 1)
X = np.vstack([np.ones(len(x)), x]).T  # Shape: (3, 2), where each row is [1, x_i]

# Compute optimal parameters theta using the normal equation: theta = (X^T X)^(-1) X^T y
XtX = np.dot(X.T, X)
XtX_inv = np.linalg.inv(XtX)
Xty = np.dot(X.T, y)
theta = np.dot(XtX_inv, Xty)
theta_0, theta_1 = theta  # theta_0 is the intercept (c), theta_1 is the slope (m)

# Print the parameters
print(f"Parameters in theta form: theta_0 (intercept) = {theta_0:.2f}, theta_1 (slope) = {theta_1:.2f}")
print(f"In y = mx + c form: c = {theta_0:.2f}, m = {theta_1:.2f}")
print("The model can be written as:")
print(f"  h_theta(x) = {theta_0:.2f} + {theta_1:.2f} * x")
print("Or in vectorized form: h_theta(x) = theta^T x, where theta = [theta_0, theta_1] and x = [1, x]")

# Generate points for the best-fit line
x_line = np.linspace(300, 900, 100)  # Range covering the data points
y_line = theta_0 + theta_1 * x_line  # y = mx + c using computed theta_0 and theta_1

# Plot the data points and the best-fit line
plt.scatter(x, y, color='red', label='Data points')
plt.plot(x_line, y_line, color='blue', label=f'y = {theta_1:.2f}x + {theta_0:.2f}')
plt.xlabel('Size of house (sq. ft)')
plt.ylabel('Price in $1000')
plt.title('House Price vs Size with Best-Fit Line')
plt.legend()
plt.grid(True)
plt.show()

Parameters in theta form: theta_0 (intercept) = 57.29, theta_1 (slope) = 0.08
In y = mx + c form: c = 57.29, m = 0.08
The model can be written as:
  h_theta(x) = 57.29 + 0.08 * x
Or in vectorized form: h_theta(x) = theta^T x, where theta = [theta_0, theta_1] and x = [1, x]

# Your code here