Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

“Because your data deserves a shopping spree.” 🛒


🎯 Objective

Build your own Market Basket Recommender — a system that tells customers:

“You bought X, you might also like Y (and probably don’t need Z… but who’s stopping you?)”

We’ll mix:

  • Collaborative Filtering (people like you bought...)

  • Content-Based Filtering (items similar to this...)

  • Association Rules (if-this-then-that magic...)


🧠 Setup

Fire up your Jupyter Notebook and import the usual suspects:

import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
from sklearn.metrics.pairwise import cosine_similarity

🧺 Step 1 – Create Your Mini Store

Let’s make a pretend e-commerce dataset.

data = {
    'CustomerID': [1, 1, 2, 2, 3, 3, 4, 5],
    'Item': [
        'Laptop', 'Mouse',
        'Phone', 'Headphones',
        'Milk', 'Bread',
        'Milk',
        'Laptop'
    ]
}

df = pd.DataFrame(data)
df
CustomerIDItem
1Laptop
1Mouse
2Phone
2Headphones
3Milk
3Bread
4Milk
5Laptop

Nice — a perfect mix of tech geeks and breakfast lovers.


🧮 Step 2 – Association Rules Magic

basket = pd.get_dummies(df.set_index('CustomerID')['Item']).groupby(level=0).sum()
frequent_items = apriori(basket, min_support=0.2, use_colnames=True)
rules = association_rules(frequent_items, metric='lift', min_threshold=1)
rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']]

If your rule shows {Milk} → {Bread}, congratulations — your grocery recommender now understands carbohydrates 🍞🥛


🤝 Step 3 – Collaborative Filtering (Mini Edition)

Let’s simulate user-item preferences:

import numpy as np

ratings = pd.DataFrame({
    'User1': [5, 4, np.nan, 3],
    'User2': [4, np.nan, 5, 2],
    'User3': [np.nan, 4, 4, np.nan]
}, index=['Laptop', 'Mouse', 'Phone', 'Headphones'])

similarity = pd.DataFrame(
    cosine_similarity(ratings.fillna(0)),
    index=ratings.index, columns=ratings.index
)

similarity

See which products are “best buddies.” If Laptop and Mouse have a high similarity score → your recommender nods wisely. 🧠💻🐭


🧩 Step 4 – Content-Based Filtering (Optional Spice)

You can also compare product features instead of ratings:

features = pd.DataFrame({
    'Item': ['Laptop', 'Mouse', 'Phone', 'Headphones'],
    'Category': ['Electronics', 'Electronics', 'Electronics', 'Electronics'],
    'Wireless': [0, 1, 1, 1]
})

Compute similarities between feature vectors to recommend similar items. Your model now says:

“If you like wireless headphones, you’ll love wireless regret when they run out of battery.” 🔋😅


💡 Step 5 – Combine the Insights

Fuse the power of:

  • Collaborative Filtering: What similar users bought

  • Content-Based: What similar items exist

  • Association Rules: What items co-occur frequently

🎯 Business logic:

  1. Recommend from collaborative filtering first.

  2. Fill gaps using content-based similarity.

  3. Add “bonus” suggestions from association rules.

Boom — your Hybrid Recommender is alive! 🤖💞


📊 Step 6 – Evaluate (a.k.a. “Does It Even Work?”)

You can track:

  • Precision@K – how many suggested items were actually bought

  • Coverage – % of items that appear in recommendations

  • Business KPI Impact – conversion rate, AOV (Average Order Value), or repeat purchase rate


🏪 Business Scenario

Context: You’re an analyst for an online retailer. Your boss wants a dashboard that shows:

  • Top 10 frequent item pairs

  • Personalized recommendations per user

  • Average basket size increase post-recommendation

Deliver it with a smile (and maybe a PowerPoint). Suddenly you’re the office “AI wizard.” 🧙‍♂️


🧍 Real-World Example

PlatformTechnique UsedExample
AmazonHybrid (Collaborative + Association)“Frequently Bought Together”
NetflixCollaborative Filtering“Because you watched…”
SpotifyContent-Based“More songs like this”
WalmartAssociation RulesDiapers → Beer 🍼🍺

🐍 Python Heads-Up

If you’re just getting started with pandas, data cleaning, or loops, warm up with 👉 Programming for Business

You’ll thank yourself when your code doesn’t scream KeyError: 'CustomerID'. 😭


🧠 TL;DR

  • Combine Collaborative, Content-Based, and Association Rules for a hybrid system.

  • Use mlxtend for mining frequent patterns.

  • Use sklearn.metrics.pairwise for similarities.

  • Think like a marketer, code like a data scientist.


🏁 Final Thought

A great recommender doesn’t just predict what customers want — it gently whispers:

“You deserve this… and also maybe two more.” 😏

Now go forth and make shopping addictive — ethically! 🛍️💡

# Your code here