Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Supervised Regression – Linear Models

Turning numbers into predictions — the foundation of quantitative business analysis


You already know how to work with vectors, matrices, derivatives, and expected values from the Math Foundations block. Now those tools become a real model: one that learns a relationship from data and makes predictions you can explain to a business stakeholder in a single sentence.

Why Regression Matters in Business

Regression answers the question every business leader asks: "If this input changes, what happens to the outcome?" It is interpretable, fast to train, and often accurate enough for a first production model.
Business questionRegression answer
How much revenue does one extra dollar of ad spend generate?Coefficient on ad_spend
What will next quarter’s sales be?Predicted y^\hat{y} from a trained model
Which product features drive price?Coefficients ranked by magnitude
How risky is this loan application?Predicted expected loss
Will this marketing channel improve ROI?Coefficient sign and confidence interval

The Core Equation

A linear model combines features using a weighted sum:

y^  =  β0+β1x1+β2x2++βnxn\hat{y} \;=\; \color{#7c3aed}{\beta_0} + \color{#1d4ed8}{\beta_1}\,\color{#047857}{x_1} + \color{#1d4ed8}{\beta_2}\,\color{#047857}{x_2} + \cdots + \color{#1d4ed8}{\beta_n}\,\color{#047857}{x_n}

Color legend: green=\color{#047857}{\text{green}} = input features     \;|\; blue=\color{#1d4ed8}{\text{blue}} = learned coefficients     \;|\; purple=\color{#7c3aed}{\text{purple}} = intercept

SymbolNamePlain meaning
y^\hat{y}PredictionThe number the model outputs
xix_iFeatureOne measurable input
βi\beta_iCoefficientHow much y^\hat{y} changes per unit change in xix_i
β0\beta_0InterceptBaseline prediction when all features are zero

In matrix form the same equation compresses to:

y^  =  Xβ\hat{\mathbf{y}} \;=\; \mathbf{X}\,\boldsymbol{\beta}

where X\mathbf{X} is the feature matrix (one row per observation, one column per feature plus a bias column of ones) and β\boldsymbol{\beta} is the coefficient vector.

Visual Intuition — How a Linear Model Works

The diagram below traces the lifecycle of a regression model from raw data to a business decision.

Alt: Data enters as a matrix, the model minimises squared errors to learn coefficients, then those coefficients produce predictions that drive decisions.

The cost the model minimises is Mean Squared Error (MSE):

MSE  =  1ni=1n(yiy^i)2\text{MSE} \;=\; \frac{1}{n}\sum_{i=1}^{n}\bigl(y_i - \hat{y}_i\bigr)^2

A smaller MSE means predictions stay closer to the actual values. The MSE notebook covers this in depth.

Worked Example — Ad Spend vs Sales

A marketing team collected weekly advertising spend and sales revenue. The question: does spending more on ads reliably lift sales, and by how much?

Ad_Spend: feature (£) Sales: target (£ thousands) model.coef_: rate of change model.intercept_: baseline
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

data = pd.DataFrame({
    'Ad_Spend': [100, 200, 300, 400, 500, 600, 700],
    'Sales':    [12,  22,  26,  36,  44,  52,  60]
})

X = data[['Ad_Spend']]
y = data['Sales']

model = LinearRegression().fit(X, y)
print(f"Coefficient (slope):  {model.coef_[0]:.4f}  → each £1 extra spend → £{model.coef_[0]:.4f}k extra sales")
print(f"Intercept (baseline): {model.intercept_:.2f}")
print(f"R² on training data:  {model.score(X, y):.4f}")

# Plot
fig, ax = plt.subplots(figsize=(7, 4))
ax.scatter(data['Ad_Spend'], data['Sales'], color='#1d4ed8', s=60, label='Actual data', zorder=3)
ax.plot(data['Ad_Spend'], model.predict(X), color='#b91c1c', lw=2, label='Fitted line')
ax.set_xlabel('Advertising Spend (£)')
ax.set_ylabel('Sales (£ thousands)')
ax.set_title('Linear Regression: Ad Spend → Sales')
ax.legend()
plt.tight_layout()
plt.show()
Coefficient (slope):  0.0793  → each £1 extra spend → £0.0793k extra sales
Intercept (baseline): 4.29
R² on training data:  0.9956
<Figure size 700x400 with 1 Axes>
Interpret the result

A coefficient of roughly 0.08 means every additional £1 of advertising spend is associated with £0.08k (£80) of additional sales — a rough 8 % return per pound spent at this scale. The intercept of about 4 means the model predicts ~£4k sales even with zero ad spend, which may reflect baseline organic demand.

The R2R^2 close to 1.0 on this small dataset tells us the line fits well, but we should always evaluate on held-out data before trusting the estimate.

Key Assumptions

Linear regression relies on four main assumptions. Violating them does not always break the model, but it does limit what you can safely conclude.

AssumptionIf violatedPractical check
LinearityModel underfits curved patternsResidual vs fitted plot — look for curves
IndependenceStandard errors are wrongTime or group structure in residuals
HomoscedasticityConfidence intervals are unreliableResidual spread widens with fitted values
Low multicollinearityCoefficients become unstableVariance Inflation Factor (VIF) > 10

Interactive — Explore Slope and Intercept

Use the Pyodide cell below to change the coefficient (slope) and intercept values and observe how the predicted line shifts. This builds intuition before you move to fitting from real data.

What to notice
  • A higher slope makes the line steeper — predictions grow faster as spend increases.

  • A higher intercept shifts the whole line up — the baseline prediction increases.

  • MSE decreases as your hand-tuned values approach what LinearRegression().fit() finds automatically.

  • OLS (Ordinary Least Squares) finds the slope and intercept that minimise MSE exactly — covered in the OLS notebook.

Chapter Map

This chapter is a parent section. Each child notebook covers one focused topic. Use the map below to navigate.

Suggested reading order: Linear Model Family → MSE → Metrics → Gradients → OLS → Polynomial → Regularization → Bias–Variance → Lab.

Guided Practice

What kind of output does a regression model produce?

A continuous numerical valueCorrect. Regression predicts quantities such as revenue, price, demand, or loss — not category labels.
A class label like "churn" or "no churn"That is a classification task. Regression returns a number on a continuous scale.
A probability between 0 and 1 onlyLinear regression is not bounded to [0, 1]. Logistic regression uses that range for classification.
A ranked list of itemsRanking is a different task. Regression returns a single scalar prediction per input.

A coefficient of 0.09 on an "ad_spend" feature means:

The model has 9 % accuracyAccuracy is a classification metric. Coefficients describe feature relationships, not model performance.
Each extra unit of ad spend is associated with a 0.09 unit increase in the predictionCorrect. A coefficient represents the estimated change in the target for a one-unit increase in that feature, all else equal.
The model uses 9 % of the data for trainingData splits are a separate concept. Coefficients describe the learned relationship.
Ad spend is the ninth-most important featureFeature importance requires comparing standardised coefficients or permutation importance — the raw coefficient value alone is not an importance rank.

Which assumption is most likely violated when a residual plot shows a funnel shape (small spread on the left, large spread on the right)?

LinearityLinearity violations look like curves in the residual plot, not a widening cone.
IndependenceIndependence violations appear as patterns over time or groups, not a funnel shape.
Homoscedasticity (constant variance)Correct. A funnel shape means error variance grows with the fitted value — the definition of heteroscedasticity.
Low multicollinearityMulticollinearity affects coefficient stability, not the pattern of residual spread across fitted values.

Exercises

Exercise 1 — Interpret a coefficient

A model trained on house price data returns:

  • Intercept: 50,000

  • Coefficient on floor_area_sqm: 2,500

  • Coefficient on distance_to_station_km: -8,000

Answer in plain language: (a) What does the model predict for a 100 sqm house 1 km from a station? (b) What does the negative coefficient on distance mean for a business selling property near commuter hubs?

Hint

Plug the values into y^=β0+β1x1+β2x2\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 directly. For the business interpretation, think about what a negative coefficient on distance implies: closer is better (higher price), so the coefficient captures the premium for proximity.

Model answer

(a) y^=50,000+2,500×100+(8,000)×1=50,000+250,0008,000=£292,000\hat{y} = 50{,}000 + 2{,}500 \times 100 + (-8{,}000) \times 1 = 50{,}000 + 250{,}000 - 8{,}000 = £292{,}000

(b) Each extra kilometre from a station is associated with an £8,000 drop in predicted price. This tells a developer to pay a premium for sites within walking distance of stations, as the market prices in that convenience.


Exercise 2 — Fit and plot from data

Use the dataset below. Fit a linear regression, print the coefficient and intercept, plot the fitted line, and calculate R2R^2.

import pandas as pd
from sklearn.linear_model import LinearRegression

df = pd.DataFrame({
    'store_size_sqm': [200, 350, 500, 650, 800, 950, 1100],
    'weekly_revenue': [8000, 13500, 18000, 23500, 28000, 33000, 38500]
})

# TODO: Fit LinearRegression, print coefficient, intercept, R²
# TODO: Plot scatter + fitted line
Model solution
import matplotlib.pyplot as plt

X = df[['store_size_sqm']]
y = df['weekly_revenue']
model = LinearRegression().fit(X, y)

print(f"Coefficient: {model.coef_[0]:.2f}")
print(f"Intercept:   {model.intercept_:.2f}")
print(f"R²:          {model.score(X, y):.4f}")

fig, ax = plt.subplots(figsize=(6, 4))
ax.scatter(df['store_size_sqm'], df['weekly_revenue'], color='#1d4ed8', label='Actual')
ax.plot(df['store_size_sqm'], model.predict(X), color='#b91c1c', lw=2, label='Fitted')
ax.set_xlabel('Store Size (sqm)')
ax.set_ylabel('Weekly Revenue (£)')
ax.legend()
plt.tight_layout()
plt.show()

Interpretation: A coefficient near 35 means each extra square metre is associated with roughly £35 of additional weekly revenue. R2R^2 close to 1.0 on this synthetic dataset confirms the linear fit is excellent.

Summary

Key takeaways
  • Regression predicts continuous values — revenue, price, risk, demand — not categories.

  • Coefficients are the core insight: each one tells you how much the prediction changes per unit change in that feature.

  • The model minimises MSE to find the best-fitting coefficients.

  • Four assumptions underpin valid inference: linearity, independence, constant variance, and low multicollinearity. Always check residuals.

  • Matrix form y^=Xβ\hat{\mathbf{y}} = \mathbf{X}\boldsymbol{\beta} connects what you learned in linear algebra directly to the regression equation.


Next Steps

The next notebook — Linear Model Family — maps out the full family of linear models: simple regression with one feature, multiple regression with many features, interaction terms, and generalised linear models. Start there to build a complete picture before diving into the MSE objective, gradient derivation, or regularization.

Up next: Linear Model Family — simple, multiple, and generalised linear models side by side.