Lab – Sentiment Classification with SVM#

Welcome to the SVM Sentiment Showdown! 🎭

You’re about to use Support Vector Machines to read people’s moods — a valuable skill for both business analytics and avoiding awkward meetings. 😅


🧠 Goal#

You’ll train an SVM model to classify customer reviews as positive 😊 or negative 😡.

By the end of this lab, you’ll:

  • Clean and vectorize text data 🧹

  • Train linear and kernel SVMs ⚙️

  • Evaluate accuracy, precision, recall 📊

  • Visualize misclassifications 👀

  • Understand why SVMs rock for text classification


💼 Business Context#

Imagine you’re the Data Scientist at a coffee chain ☕. The marketing team wants to monitor customer feedback from social media and reviews.

They ask:

“Can we automatically detect unhappy customers before they go viral on Twitter?” 😬

Your answer:

“Hold my latte. I’ll train an SVM.” ☕🤓


🧰 Setup#

Let’s grab the tools first.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

📥 Step 1: Load the Data#

If you don’t have a dataset, here’s a quick synthetic one for practice:

data = {
    "review": [
        "This coffee is amazing!",
        "Worst experience ever.",
        "I love the new latte flavor!",
        "The service was terrible.",
        "Pretty good, would buy again.",
        "I hated it.",
        "Best barista in town!",
        "Cold coffee, rude staff."
    ],
    "sentiment": ["positive", "negative", "positive", "negative", "positive", "negative", "positive", "negative"]
}

df = pd.DataFrame(data)
df.head()

🧹 Step 2: Split and Vectorize#

Convert text into numbers using TF–IDF (because SVMs can’t read English, only math).

X_train, X_test, y_train, y_test = train_test_split(df['review'], df['sentiment'], test_size=0.3, random_state=42)

vectorizer = TfidfVectorizer(stop_words='english')
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

⚙️ Step 3: Train the SVM Model#

We’ll start simple — a Linear SVM.

model = LinearSVC(C=1.0)
model.fit(X_train_vec, y_train)

y_pred = model.predict(X_test_vec)

📊 Step 4: Evaluate Performance#

Let’s see how well it understands human emotions.

print(classification_report(y_test, y_pred))

Sample output:

              precision    recall  f1-score   support

    negative       1.00      1.00      1.00         1
    positive       1.00      1.00      1.00         1

    accuracy                           1.00         2
   macro avg       1.00      1.00      1.00         2
weighted avg       1.00      1.00      1.00         2

SVM: “I can sense your mood with 100% confidence.” 😎 (Just don’t show it internet sarcasm yet.)


📉 Step 5: Visualize Confusion Matrix#

cm = confusion_matrix(y_test, y_pred, labels=model.classes_)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=model.classes_, yticklabels=model.classes_)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("SVM Confusion Matrix – Feelings, Quantified 💔💖")
plt.show()

🔬 Step 6: Experiment with Kernels#

Let’s try a non-linear kernel — maybe the data has emotional complexity. 💅

from sklearn.svm import SVC

model_rbf = SVC(kernel='rbf', gamma=0.7, C=1.0)
model_rbf.fit(X_train_vec, y_train)
y_pred_rbf = model_rbf.predict(X_test_vec)

print(classification_report(y_test, y_pred_rbf))

You’ll likely get similar performance for small text datasets — but on real data (like thousands of tweets), kernel SVMs can reveal deeper sentiment patterns. 💬


🧠 Optional Challenge: Use a Real Dataset#

Try with:

Then:

  1. Clean the text (re, nltk, or spaCy)

  2. Vectorize (TfidfVectorizer)

  3. Train multiple SVMs with different kernels

  4. Compare performance

  5. Make a dashboard that highlights “Top 10 Angry Words” 😤


💼 Business Insight#

Sentiment analysis isn’t just for fun — it’s used in:

  • Customer experience tracking

  • Brand reputation monitoring

  • Product feedback prioritization

  • Stock market sentiment prediction 📈

With SVMs, you can scale these insights across thousands of reviews and alert management before the next PR crisis hits. 🚨


💬 TL;DR#

Step

What You Did

Why It’s Cool

1

Loaded text data

Coffee reviews are data too ☕

2

Vectorized using TF–IDF

Turned words into math

3

Trained Linear SVM

Found the “mood boundary”

4

Evaluated results

Quantified emotions

5

Visualized confusion matrix

Feelings meet charts

6

Tried kernels

Got fancy and flexible


💡 SVMs may not have feelings, but they’re really good at detecting yours. 💔🤖❤️


🔗 Next Chapter: Ensemble Methods & Tree-Based Models Because sometimes it takes a forest 🌲 to make the right decision.

# Your code here