Lab – Sentiment Classification with SVM#
Welcome to the SVM Sentiment Showdown! 🎭
You’re about to use Support Vector Machines to read people’s moods — a valuable skill for both business analytics and avoiding awkward meetings. 😅
🧠 Goal#
You’ll train an SVM model to classify customer reviews as positive 😊 or negative 😡.
By the end of this lab, you’ll:
Clean and vectorize text data 🧹
Train linear and kernel SVMs ⚙️
Evaluate accuracy, precision, recall 📊
Visualize misclassifications 👀
Understand why SVMs rock for text classification
💼 Business Context#
Imagine you’re the Data Scientist at a coffee chain ☕. The marketing team wants to monitor customer feedback from social media and reviews.
They ask:
“Can we automatically detect unhappy customers before they go viral on Twitter?” 😬
Your answer:
“Hold my latte. I’ll train an SVM.” ☕🤓
🧰 Setup#
Let’s grab the tools first.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
📥 Step 1: Load the Data#
If you don’t have a dataset, here’s a quick synthetic one for practice:
data = {
"review": [
"This coffee is amazing!",
"Worst experience ever.",
"I love the new latte flavor!",
"The service was terrible.",
"Pretty good, would buy again.",
"I hated it.",
"Best barista in town!",
"Cold coffee, rude staff."
],
"sentiment": ["positive", "negative", "positive", "negative", "positive", "negative", "positive", "negative"]
}
df = pd.DataFrame(data)
df.head()
🧹 Step 2: Split and Vectorize#
Convert text into numbers using TF–IDF (because SVMs can’t read English, only math).
X_train, X_test, y_train, y_test = train_test_split(df['review'], df['sentiment'], test_size=0.3, random_state=42)
vectorizer = TfidfVectorizer(stop_words='english')
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)
⚙️ Step 3: Train the SVM Model#
We’ll start simple — a Linear SVM.
model = LinearSVC(C=1.0)
model.fit(X_train_vec, y_train)
y_pred = model.predict(X_test_vec)
📊 Step 4: Evaluate Performance#
Let’s see how well it understands human emotions.
print(classification_report(y_test, y_pred))
Sample output:
precision recall f1-score support
negative 1.00 1.00 1.00 1
positive 1.00 1.00 1.00 1
accuracy 1.00 2
macro avg 1.00 1.00 1.00 2
weighted avg 1.00 1.00 1.00 2
SVM: “I can sense your mood with 100% confidence.” 😎 (Just don’t show it internet sarcasm yet.)
📉 Step 5: Visualize Confusion Matrix#
cm = confusion_matrix(y_test, y_pred, labels=model.classes_)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=model.classes_, yticklabels=model.classes_)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("SVM Confusion Matrix – Feelings, Quantified 💔💖")
plt.show()
🔬 Step 6: Experiment with Kernels#
Let’s try a non-linear kernel — maybe the data has emotional complexity. 💅
from sklearn.svm import SVC
model_rbf = SVC(kernel='rbf', gamma=0.7, C=1.0)
model_rbf.fit(X_train_vec, y_train)
y_pred_rbf = model_rbf.predict(X_test_vec)
print(classification_report(y_test, y_pred_rbf))
You’ll likely get similar performance for small text datasets — but on real data (like thousands of tweets), kernel SVMs can reveal deeper sentiment patterns. 💬
🧠 Optional Challenge: Use a Real Dataset#
Try with:
Then:
Clean the text (
re,nltk, orspaCy)Vectorize (
TfidfVectorizer)Train multiple SVMs with different kernels
Compare performance
Make a dashboard that highlights “Top 10 Angry Words” 😤
💼 Business Insight#
Sentiment analysis isn’t just for fun — it’s used in:
Customer experience tracking
Brand reputation monitoring
Product feedback prioritization
Stock market sentiment prediction 📈
With SVMs, you can scale these insights across thousands of reviews and alert management before the next PR crisis hits. 🚨
💬 TL;DR#
Step |
What You Did |
Why It’s Cool |
|---|---|---|
1 |
Loaded text data |
Coffee reviews are data too ☕ |
2 |
Vectorized using TF–IDF |
Turned words into math |
3 |
Trained Linear SVM |
Found the “mood boundary” |
4 |
Evaluated results |
Quantified emotions |
5 |
Visualized confusion matrix |
Feelings meet charts |
6 |
Tried kernels |
Got fancy and flexible |
💡 SVMs may not have feelings, but they’re really good at detecting yours. 💔🤖❤️
🔗 Next Chapter: Ensemble Methods & Tree-Based Models Because sometimes it takes a forest 🌲 to make the right decision.
# Your code here