Classification predicts categories and decision outcomes¶
This chapter focuses on models that answer yes or no, fraud or safe, churn or stay, positive or negative, and other label-based decisions used across business processes.
Why Classification Matters¶

Business Applications¶
| Area | Typical decision |
|---|---|
| Marketing | Will this customer respond to the offer? |
| Finance | Does this transaction look fraudulent? |
| Operations | Is this item likely to fail quality checks? |
| Customer success | Is this account at risk of churn? |
Core Equation¶
Variable Guide¶
: feature input
: class label
: learned weights
: bias term
: sigmoid function
A classification model often produces a probability first, then converts that probability into a class label using a threshold.
Worked Example¶
Code Variable Guide¶
import numpy as np
from sklearn.linear_model import LogisticRegression
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([0, 0, 0, 1, 1])
model = LogisticRegression()
model.fit(X, y)
print("Predicted probabilities:", model.predict_proba([[2.5], [4.5]]))The probabilities describe uncertainty directly, which is often more useful in business than a hard label by itself.
Model Families in This Chapter¶
| Model family | Main idea | Typical strength |
|---|---|---|
| Logistic Regression | Linear decision boundary with probability output | Interpretable baseline classifier |
| Naive Bayes | Probabilistic classification with simplifying assumptions | Fast and effective on some sparse or text-like problems |
| Calibration and imbalance handling | Probability correction and class-prior awareness | Better threshold decisions and fairer evaluation |
Practice Exercise¶
Focus Variables¶
Fit a basic logistic regression model for a churn problem.
Generate predicted labels and predicted probabilities.
Compare the outcomes using a confusion matrix and classification metrics.
Explain which mistakes are more expensive in the business setting.
Continue¶
Next, move to Logistic Regression to study the most common baseline classifier in detail.
import numpy as np
from sklearn.linear_model import LogisticRegression
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([0, 0, 0, 1, 1])
model = LogisticRegression()
model.fit(X, y)
print(model.predict([[2.5], [4.5]]))
print(model.predict_proba([[2.5], [4.5]]))[0 1]
[[0.75594369 0.24405631]
[0.27617405 0.72382595]]