Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Welcome to the SVM Sentiment Showdown! 🎭

You’re about to use Support Vector Machines to read people’s moods — a valuable skill for both business analytics and avoiding awkward meetings. 😅


🧠 Goal

You’ll train an SVM model to classify customer reviews as positive 😊 or negative 😡.

By the end of this lab, you’ll:

  • Clean and vectorize text data 🧹

  • Train linear and kernel SVMs ⚙️

  • Evaluate accuracy, precision, recall 📊

  • Visualize misclassifications 👀

  • Understand why SVMs rock for text classification


💼 Business Context

Imagine you’re the Data Scientist at a coffee chain ☕. The marketing team wants to monitor customer feedback from social media and reviews.

They ask:

“Can we automatically detect unhappy customers before they go viral on Twitter?” 😬

Your answer:

“Hold my latte. I’ll train an SVM.” ☕🤓


🧰 Setup

Let’s grab the tools first.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

📥 Step 1: Load the Data

If you don’t have a dataset, here’s a quick synthetic one for practice:

data = {
    "review": [
        "This coffee is amazing!",
        "Worst experience ever.",
        "I love the new latte flavor!",
        "The service was terrible.",
        "Pretty good, would buy again.",
        "I hated it.",
        "Best barista in town!",
        "Cold coffee, rude staff."
    ],
    "sentiment": ["positive", "negative", "positive", "negative", "positive", "negative", "positive", "negative"]
}

df = pd.DataFrame(data)
df.head()

🧹 Step 2: Split and Vectorize

Convert text into numbers using TF–IDF (because SVMs can’t read English, only math).

X_train, X_test, y_train, y_test = train_test_split(df['review'], df['sentiment'], test_size=0.3, random_state=42)

vectorizer = TfidfVectorizer(stop_words='english')
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

⚙️ Step 3: Train the SVM Model

We’ll start simple — a Linear SVM.

model = LinearSVC(C=1.0)
model.fit(X_train_vec, y_train)

y_pred = model.predict(X_test_vec)

📊 Step 4: Evaluate Performance

Let’s see how well it understands human emotions.

print(classification_report(y_test, y_pred))

Sample output:

              precision    recall  f1-score   support

    negative       1.00      1.00      1.00         1
    positive       1.00      1.00      1.00         1

    accuracy                           1.00         2
   macro avg       1.00      1.00      1.00         2
weighted avg       1.00      1.00      1.00         2

SVM: “I can sense your mood with 100% confidence.” 😎 (Just don’t show it internet sarcasm yet.)


📉 Step 5: Visualize Confusion Matrix

cm = confusion_matrix(y_test, y_pred, labels=model.classes_)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=model.classes_, yticklabels=model.classes_)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("SVM Confusion Matrix – Feelings, Quantified 💔💖")
plt.show()

🔬 Step 6: Experiment with Kernels

Let’s try a non-linear kernel — maybe the data has emotional complexity. 💅

from sklearn.svm import SVC

model_rbf = SVC(kernel='rbf', gamma=0.7, C=1.0)
model_rbf.fit(X_train_vec, y_train)
y_pred_rbf = model_rbf.predict(X_test_vec)

print(classification_report(y_test, y_pred_rbf))

You’ll likely get similar performance for small text datasets — but on real data (like thousands of tweets), kernel SVMs can reveal deeper sentiment patterns. 💬


🧠 Optional Challenge: Use a Real Dataset

Try with:

Then:

  1. Clean the text (re, nltk, or spaCy)

  2. Vectorize (TfidfVectorizer)

  3. Train multiple SVMs with different kernels

  4. Compare performance

  5. Make a dashboard that highlights “Top 10 Angry Words” 😤


💼 Business Insight

Sentiment analysis isn’t just for fun — it’s used in:

  • Customer experience tracking

  • Brand reputation monitoring

  • Product feedback prioritization

  • Stock market sentiment prediction 📈

With SVMs, you can scale these insights across thousands of reviews and alert management before the next PR crisis hits. 🚨


💬 TL;DR

StepWhat You DidWhy It’s Cool
1Loaded text dataCoffee reviews are data too ☕
2Vectorized using TF–IDFTurned words into math
3Trained Linear SVMFound the “mood boundary”
4Evaluated resultsQuantified emotions
5Visualized confusion matrixFeelings meet charts
6Tried kernelsGot fancy and flexible

💡 SVMs may not have feelings, but they’re really good at detecting yours. 💔🤖❤️


🔗 Next Chapter: Ensemble Methods & Tree-Based Models Because sometimes it takes a forest 🌲 to make the right decision.

# Your code here