Collaborative Filtering

Collaborative Filtering#

Welcome to Collaborative Filtering — the algorithmic version of “people like you also liked this.”

It’s how Netflix knows you’re probably about to rewatch The Office again, and how Amazon gently suggests that you might also want to buy batteries with your new gaming controller. 🎮🔋

🧠 The Big Idea#

Collaborative filtering (CF) relies on one simple (but powerful) assumption:

“If two users liked similar things in the past, they’ll probably like similar things in the future.”

No need to know what the items are — it’s pure behavioral vibes. Like gossip among data points.

There are two main kinds:

User-based CF – “Find users like me.”
Item-based CF – “Find items similar to what I like.”

🧍 User-Based Collaborative Filtering#

Here’s how it works:

You (the user) have rated or interacted with some items.
We find other users who behaved similarly.
We recommend things those users liked that you haven’t tried yet.

Example:

You and Alice both loved “The Office” and “Parks and Rec.” Alice also watched “Brooklyn Nine-Nine.” Guess what shows up in your “Recommended for You”? 😏

Formula Time 🧮#

We calculate similarity between users using measures like:

Cosine similarity
Pearson correlation

from sklearn.metrics.pairwise import cosine_similarity
similarity = cosine_similarity(user_item_matrix)

Then, for a given user, we take a weighted average of other users’ ratings — weighted by how similar they are.

More similar users = louder votes. 🗳️

📦 Item-Based Collaborative Filtering#

Now flip the logic:

“If you liked Product A, and Product A is similar to Product B, then we’ll recommend Product B.”

It’s faster and more stable when user bases are large. You don’t change as often as your preferences for items do (hopefully).

🧩 Business Example#

Scenario: You run an online bookstore 📚

User 1 buys “Data Science for Dummies” and “Python for Business.”
User 2 buys “Machine Learning for Business” and “Python for Business.”

The algorithm thinks:

“Aha! ‘Machine Learning for Business’ and ‘Python for Business’ seem best friends.”

So next time someone buys one, we recommend the other. 💸

🧪 Practice: Quick Lab#

Try this simple simulation 👇

import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

ratings = pd.DataFrame({
    "User": ["A", "A", "B", "B", "C", "C"],
    "Item": ["Item1", "Item2", "Item1", "Item3", "Item2", "Item3"],
    "Rating": [5, 3, 4, 2, 2, 5]
})

user_item = ratings.pivot(index="User", columns="Item", values="Rating").fillna(0)
similarity = cosine_similarity(user_item)
pd.DataFrame(similarity, index=user_item.index, columns=user_item.index)

💡 Try predicting what user “C” might like next.

⚖️ Strengths & Weaknesses#

Pros	Cons
Works without item metadata	Cold start for new users/items
Learns from user behavior directly	Needs lots of data
Easy to interpret	Can overfit popular items

💼 Business Impact#

E-commerce: “People who bought this also bought…”
Streaming: “Viewers like you enjoyed…”
Retail: Personalized offers, higher engagement
Finance: Recommended credit cards based on similar customers

Collaborative filtering = personalized marketing on autopilot. 🚀

🧙‍♂️ Pro Tip#

If you’re just getting started:

Use Matrix Factorization (e.g., TruncatedSVD) for scalability.
Use implicit feedback (views, clicks) when explicit ratings are rare.

🐍 Python Heads-Up#

You’ll be playing with: surprise, sklearn, or implicit libraries for recommender systems.

If Python feels a bit sleepy, warm up first with 👉 Programming for Business

🧠 TL;DR#

Collaborative Filtering:

Doesn’t care what the product is.
Just cares who liked it.
And uses that gossip to make you shop more. 😎

Next up: Let’s meet the Content-Based Recommender — the algorithm that says, “I don’t need friends. I just compare item features.” 💁‍♂️

# Your code here