Tree-Based Models & Ensembles - Machine Learning for Business

Welcome to the forest of wisdom, where models split, branch, and leaf their way to glory. If linear models were accountants (neat, simple, no fun), then Decision Trees are your wild data detectives — asking “What if?” questions until they solve the mystery.

🧭 What You’ll Learn Here¶

How Decision Trees make decisions (and sometimes bad ones).
Why Bagging and Random Forests are like group projects that actually work.
How Boosting (hello, XGBoost 👋) builds a model army — one overachiever at a time.
And finally, how to interpret Feature Importance — so you know which variable is the real drama queen. 🌟

🎢 Why Trees Are Awesome (and Sometimes Dangerous)¶

Decision Trees are intuitive — they literally say things like:

“If Income > 60K and Age < 25 → probably buying iPhones on EMI.”

But they can also overfit faster than students learning the night before exams. That’s why we prune them 🌿 and grow forests 🌲 to control their chaos.

🤹 The Ensemble Philosophy¶

“One model is okay. Many models together? Unstoppable.”

That’s the motto of ensemble learning — it takes many weak learners and turns them into one strong predictor. (Like turning an office full of interns into one genius PowerPoint presentation.)

🧠 Common Cast Members¶

Model	Type	Personality
Decision Tree	Base Learner	The solo artist who loves attention
Random Forest	Bagging Ensemble	The team player – multiple trees voting together
XGBoost	Boosting Ensemble	The caffeinated perfectionist who redoes everyone’s work better

🪄 A Business Example¶

Imagine a fraud detection system: Each tree looks for suspicious patterns —

“Transaction at 3 AM?” 🌙
“Foreign location?” 🌍
“Sudden $5,000 purchase at a candle store?” 🕯️ (definitely sus)

The forest then votes — if too many trees say “yeah, that’s sketchy”, the model flags it before your CFO even wakes up.

🐍 Python Heads-Up¶

You’ll be meeting: DecisionTreeClassifier, RandomForestClassifier, and XGBClassifier from sklearn and xgboost.

If Python’s still making you scratch your head, take a power detour here: 👉 Programming for Business

🧩 Coming Up Next¶

Section	Description
Decision Trees	Learn how trees split data using information gain & Gini impurity
Bagging, RF, XGBoost	Ensemble methods that supercharge accuracy
Feature Importance	Find out which features matter most
Lab – Fraud Detection	Build a tree-based fraud detector that even your bank would envy 💳

🎬 In short: We’re about to grow forests, vote on predictions, and let a bunch of trees outsmart the smartest humans. So grab your pruning shears, and let’s branch out into the world of Tree-Based Models & Ensembles! 🌲🔥

Information Gain, Gini Impurity¶

Information Gain quantifies the reduction in entropy after splitting a dataset on a feature. Entropy measures the impurity of a dataset:

\text{Entropy}(S) = -\sum_{i=1}^c p_i \log_2(p_i)

(1)

where $p_i$ is the proportion of class $i$ in dataset $S$ , and $c$ is the number of classes. Information Gain is:

\text{IG}(A, S) = \text{Entropy}(S) - \sum_{v \in \text{Values}(A)} \frac{|S_v|}{|S|} \text{Entropy}(S_v)

(2)

Gini Impurity measures the probability of misclassifying a random element:

\text{Gini}(S) = 1 - \sum_{i=1}^c p_i^2

(3)

Lower Gini values indicate purer nodes. CART algorithms typically use Gini for classification splits.

Recursive Partitioning and Pruning¶

Recursive Partitioning: Decision trees recursively split the dataset into subsets based on feature values, using a criterion like Information Gain (ID3) or Gini (CART) for classification, or variance reduction for regression. Splitting stops when conditions like maximum depth or minimum samples are met.

Pruning: Pruning mitigates overfitting by removing branches with low predictive power. Pre-pruning uses constraints (e.g., max depth), while post-pruning removes nodes post-construction if they don’t improve validation performance.

Overfitting in Trees¶

Decision trees can overfit by growing too deep, capturing noise in the data. Mitigation strategies include:

Limiting tree depth (max_depth).
Setting a minimum number of samples per split (min_samples_split).
Pruning branches with minimal impact.
Using ensemble methods like Random Forests.

Python: ID3/CART from Scratch¶

Below is a Python implementation of a decision tree supporting both classification (ID3-like, using Information Gain) and regression (CART-like, using variance reduction).

# Your code here