
Max-Margin Intuition¶
Learning objectives
By the end of this notebook you will be able to:
Explain the maximum-margin classification principle geometrically.
Define support vectors, margin, and the decision boundary hyperplane.
State the hard-margin SVM primal optimisation problem.
Derive why maximum margin corresponds to minimising .
Visualise the decision boundary and margin bands for a linear SVM.
Understand why larger margin generalises better (structural risk minimisation).
Train a linear SVM with sklearn and identify the support vectors.

Business hook — The safest boundary¶
Two loan officers each draw a line separating high-risk from low-risk applicants. Both lines correctly classify all training examples. But Officer A’s line runs through the middle with no gap; Officer B’s line keeps the maximum possible distance from both groups.
When a new ambiguous applicant arrives, Officer B’s model is more confident — because being far from both classes means being robust to measurement noise and slight distributional shift.
This is the core intuition of Support Vector Machines: not just any separating hyperplane, but the one with maximum margin.
1. Maximum Margin — The Geometry¶
A hyperplane in is defined by:
The signed distance from a point to the hyperplane is:
For a classifier to label correctly with confidence at least :
Setting (the canonical form), the margin is:
Hard-margin SVM primal problem:
Support vectors are the training points that lie exactly on the margin boundaries:
The decision boundary is completely determined by these support vectors — all other training points have no influence.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
# Original user code: linear SVM on simple data
X = np.array([[1, 2], [2, 3], [3, 3], [6, 5], [7, 8], [8, 8]])
y = np.array([0, 0, 0, 1, 1, 1])
clf = svm.SVC(kernel='linear', C=1000)
clf.fit(X, y)
fig, ax = plt.subplots(figsize=(7, 5))
ax.scatter(X[:, 0], X[:, 1], c=y, cmap='bwr', edgecolors='k', s=100, zorder=10)
xlim = [0, 10]
ylim = [0, 10]
xx = np.linspace(xlim[0], xlim[1], 50)
yy = np.linspace(ylim[0], ylim[1], 50)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf.decision_function(xy).reshape(XX.shape)
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1],
linestyles=['--', '-', '--'], linewidths=[1, 2, 1])
ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],
s=200, facecolors='none', edgecolors='black', linewidths=2,
label='Support vectors')
ax.set_xlim(xlim); ax.set_ylim(ylim)
ax.set_xlabel('Feature 1'); ax.set_ylabel('Feature 2')
ax.set_title('Linear SVM: Decision Boundary and Margin Bands')
ax.legend()
ax.grid(True, alpha=0.3)
# Annotate margin
w = clf.coef_[0]
margin = 2 / np.linalg.norm(w)
print(f'Support vectors:\n{clf.support_vectors_}')
print(f'Weight vector w: {w.round(4)}')
print(f'Bias b: {clf.intercept_[0]:.4f}')
print(f'Margin (2/||w||): {margin:.4f}')
plt.show()2. Why Maximum Margin Generalises Better¶
Structural Risk Minimisation (SRM) provides the theoretical justification. The generalisation error bound for an SVM is:
where is the VC dimension (measure of hypothesis complexity). For a linear classifier in the VC dimension is at most . But for an SVM with margin , the effective VC dimension is bounded by where is the data radius.
Larger margin → lower VC dimension → tighter generalisation bound.
This means SVMs with large margins are theoretically justified to generalise — not just as a heuristic, but as a principled complexity-control mechanism (analogous to regularisation in regression).
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.datasets import make_circles
# Original user code: preview of kernel trick
X, y = make_circles(100, factor=0.3, noise=0.1, random_state=42)
# Linear SVM fails
clf_lin = svm.SVC(kernel='linear', C=1.0)
clf_lin.fit(X, y)
# RBF SVM succeeds (preview)
clf_rbf = svm.SVC(kernel='rbf', gamma='scale', C=1.0)
clf_rbf.fit(X, y)
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
for ax, clf, title in [
(axes[0], clf_lin, f'Linear SVM (accuracy={clf_lin.score(X,y):.2f})'),
(axes[1], clf_rbf, f'RBF SVM (accuracy={clf_rbf.score(X,y):.2f})')
]:
ax.scatter(X[:, 0], X[:, 1], c=y, cmap='bwr', edgecolors='k', s=50)
xlim = ax.get_xlim() if ax.get_xlim() != (0, 1) else (-1.5, 1.5)
ylim = ax.get_ylim() if ax.get_ylim() != (0, 1) else (-1.5, 1.5)
ax.set_xlim(-1.5, 1.5); ax.set_ylim(-1.5, 1.5)
xx, yy = np.meshgrid(np.linspace(-1.5, 1.5, 100), np.linspace(-1.5, 1.5, 100))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
ax.contourf(xx, yy, Z, levels=20, cmap='bwr', alpha=0.2)
ax.contour(xx, yy, Z, colors='k', levels=[0], linewidths=2)
ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],
s=100, facecolors='none', edgecolors='k', linewidths=1.5)
ax.set_title(title)
ax.grid(True, alpha=0.3)
plt.suptitle('Linear SVM fails on non-linear data — RBF kernel solves it (preview)', fontsize=10)
plt.tight_layout()
plt.show()3. Support Vector Regression (SVR)¶
SVMs extend naturally to regression via the epsilon-insensitive tube:
subject to .
Points inside the tube incur no loss. Points outside contribute to the objective via slack variables .
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVR
# Original user code: SVR with epsilon tube
np.random.seed(42)
X = np.sort(5 * np.random.rand(40, 1), axis=0)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])
svr = SVR(kernel='rbf', C=100, gamma=0.1, epsilon=0.1)
svr.fit(X, y)
X_test = np.linspace(0, 5, 100)[:, np.newaxis]
y_pred = svr.predict(X_test)
plt.figure(figsize=(8, 4))
plt.scatter(X, y, color='blue', label='Data', s=40)
plt.plot(X_test, y_pred, color='red', linewidth=2, label='SVR Prediction')
plt.fill_between(X_test.ravel(), y_pred - svr.epsilon, y_pred + svr.epsilon,
color='gray', alpha=0.3, label=f'Epsilon tube (eps={svr.epsilon})')
plt.title('Support Vector Regression (SVR)')
plt.xlabel('X'); plt.ylabel('y')
plt.legend()
plt.grid(True)
plt.show()4. Try It in the Browser¶
Compute the margin for a 2D linear SVM by hand.
import math
# Hard-margin SVM: w*x + b = 0 separates two classes
# w = [1, 1], b = -5 => line: x1 + x2 = 5
w = [1.0, 1.0]
b = -5.0
# Margin = 2 / ||w||
w_norm = math.sqrt(sum(wi**2 for wi in w))
margin = 2 / w_norm
print(f'Weight vector: {w}')
print(f'||w|| = {w_norm:.4f}')
print(f'Margin = 2/||w|| = {margin:.4f}')
# Check a positive and negative example
examples = [(4, 3, +1, 'positive'), (1, 1, -1, 'negative')]
print()
for x1, x2, label, name in examples:
score = w[0]*x1 + w[1]*x2 + b
decision = 'correct' if score * label >= 1 else 'VIOLATION'
print(f'{name} ({x1},{x2}): score={score:.1f}, y*score={label*score:.1f} [{decision}]')Knowledge Check¶
In the hard-margin SVM, what are support vectors?¶
Why does maximising the margin correspond to minimising ||w||²?¶
Exercises¶
Exercise 1 — Margin Sensitivity to C¶
Train linear SVMs with C = [0.001, 0.1, 1, 10, 1000] on a 2D synthetic dataset. For each, print the margin (2/||w||) and number of support vectors. What happens as C grows?
%matplotlib inline
# Exercise 1: margin sensitivity to C
# Your code hereCommon Pitfalls¶
Summary
SVM finds the hyperplane with maximum margin .
Hard-margin primal: subject to .
Support vectors are the points on the margin boundary — they fully determine the decision boundary.
Maximum margin reduces VC dimension, providing tighter generalisation bounds (SRM).
Always scale features before fitting an SVM.

What’s Next?¶
Linear SVMs are limited to linearly separable data. In svm_kernels.ipynb we introduce the kernel trick — mapping data to a high-dimensional feature space implicitly via a kernel function — allowing SVMs to learn non-linear boundaries without the computational cost of explicit feature expansion.
Kernel SVMs — the kernel trick, RBF and polynomial kernels
Soft Margin & Regularisation — the C parameter and hinge loss