Data Visualisation#
Making Numbers Speak Louder Than Words
“If data were a novel, visualization would be the movie adaptation everyone actually watches.”
Welcome, Data Picasso! 🧑🎨 You’ve cleaned, wrangled, and explored your data. Now it’s time to make it shine — not with glitter, but with charts that make business people go “Ooooh!”
🧠 Why Visualization Matters#
In business, a great chart can:
Save you 10 slides of explanation 📊
Convince your boss faster than a 50-page report 💼
Expose hidden insights (or at least hide your messy code 😉)
💬 “Never underestimate a well-placed bar chart — it can justify your annual budget.”
🧰 Prerequisite#
If you’re new to Python plotting libraries, check out my other book: 👉 📘 Programming for Business
🖼️ The Visualization Toolkit#
Let’s meet your artistic weapons of choice:
Library |
Best For |
Why You’ll Love It |
|---|---|---|
Matplotlib |
Custom static plots |
Like Excel, but on steroids |
Seaborn |
Quick, pretty statistical charts |
Beautiful defaults, less crying |
Plotly |
Interactive dashboards |
Drag, zoom, impress investors |
Altair |
Declarative visualization |
Concise, elegant, and logic-based |
🎬 Step 1. Setting the Stage#
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_csv("sales_data_clean.csv")
sns.set(style="whitegrid", palette="Set2")
💬 “Always set a nice style — even your plots deserve good fashion.”
📈 Step 2. Bar Charts — The Office Favorite#
When in doubt, use a bar chart. They’re simple, effective, and universally loved by PowerPoint warriors.
sns.barplot(x='region', y='sales_amount', data=df, estimator='mean')
plt.title("Average Sales by Region")
plt.xlabel("Region")
plt.ylabel("Sales ($)")
plt.show()
🗣️ “This chart says more about your market than 5 strategy meetings.”
📊 Step 3. Line Charts — Time’s Storytellers#
Use line plots to show trends, growth, and seasonal effects.
df['month'] = pd.to_datetime(df['date']).dt.to_period('M')
monthly_sales = df.groupby('month')['sales_amount'].sum().reset_index()
sns.lineplot(x='month', y='sales_amount', data=monthly_sales, marker='o')
plt.title("Monthly Sales Trend")
plt.show()
💡 “A line going up? You’re a genius. A line going down? You’re an analyst who ‘found optimization opportunities.’”
🍩 Step 4. Pie Charts — Only for Special Occasions#
Use sparingly. One pie chart per quarter is the professional limit.
region_sales = df.groupby('region')['sales_amount'].sum()
region_sales.plot(kind='pie', autopct='%1.1f%%', startangle=90)
plt.title("Sales Share by Region")
plt.ylabel("")
plt.show()
💬 “Pie charts are like desserts — lovely in small doses, disastrous if overused.”
🔁 Step 5. Scatter Plots — The Relationship Therapist#
Show how two variables interact — like marketing spend and sales.
sns.scatterplot(x='marketing_spend', y='sales_amount', data=df)
plt.title("Marketing Spend vs Sales")
plt.show()
💬 “If there’s an upward trend — congrats, your marketing is actually doing something!”
🧩 Step 6. Heatmaps — The Data Spa#
Relax your brain and let colors show correlations.
corr = df.corr(numeric_only=True)
sns.heatmap(corr, annot=True, cmap="coolwarm", fmt=".2f")
plt.title("Correlation Heatmap")
plt.show()
💡 “Red means love, blue means distance — just like business meetings.”
🧠 Step 7. Pairplots — The Multiverse of Data#
A full relationship matrix — because who doesn’t love chaos?
sns.pairplot(df[['sales_amount', 'marketing_spend', 'profit']])
plt.show()
💬 “This is the Tinder of plots — every variable gets to date every other one.”
📦 Step 8. Business Dashboard Preview#
Turn your visuals into interactive gold:
import plotly.express as px
fig = px.bar(df, x='region', y='sales_amount', color='product_category', title="Sales by Region and Category")
fig.show()
💡 “Plotly charts are like Instagram filters — they make your data instantly more likable.”
🧪 Practice Lab — “Visualize or Vanish!”#
Use the dataset company_sales.csv and visualize:
Sales by product category
Sales trend over time
Profit vs marketing spend
Region-wise revenue contribution
A correlation heatmap between numerical features
🎯 Bonus: Create one chart that could appear in your company’s annual report (without getting you fired).
🧭 Recap#
Visualization |
Use Case |
Library |
|---|---|---|
Bar Chart |
Compare categories |
Seaborn / Matplotlib |
Line Chart |
Show trends over time |
Seaborn |
Scatter Plot |
Relationships |
Seaborn |
Heatmap |
Correlation matrix |
Seaborn |
Pie Chart |
Share distribution |
Matplotlib |
Interactive Dash |
Exploration |
Plotly |
🎁 Quick Pro Tips#
Don’t overload charts — your data deserves breathing space 🧘
Color with purpose — not all rainbows are business-friendly 🌈
Always add titles and labels 📋
Save your plots as
.pngfor reports 📸
plt.savefig("monthly_sales.png", dpi=300, bbox_inches='tight')
💬 “Because nothing says ‘professional’ like a crisp 300 DPI chart.”
🔜 Next Stop#
👉 Business Dashboards You’ll learn how to stitch your visual masterpieces together — to make dashboards so sleek that executives might finally stop asking for Excel sheets.
Histograms, Box Plots, Scatter Plots, Pair Plots, Heatmaps (with customization)#
🎯 Learning Objectives#
By the end of this section, students should be able to:
Create basic visualizations using matplotlib and seaborn.
Understand the purpose and interpretation of each plot type.
Customize visual elements — titles, labels, legends, colors, and layout.
Use visualizations to gain insights into data distribution and relationships.
🧩 1. Setup and Dummy Dataset#
We’ll use a small artificial dataset that mimics a business scenario (sales, customers, and marketing).
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# For consistent plot style
sns.set(style="whitegrid")
# Dummy dataset
np.random.seed(42)
df = pd.DataFrame({
"Sales": np.random.normal(50000, 12000, 200),
"Profit": np.random.normal(8000, 3000, 200),
"Marketing_Spend": np.random.normal(10000, 2500, 200),
"Customer_Rating": np.random.uniform(1, 5, 200),
"Region": np.random.choice(["North", "South", "East", "West"], 200)
})
df.head()
📊 2. Histogram — Understanding Data Distribution#
Histograms help visualize the distribution (spread and skewness) of numerical variables.
plt.figure(figsize=(8, 5))
plt.hist(df["Sales"], bins=20, color='skyblue', edgecolor='black')
plt.title("Distribution of Sales", fontsize=14, fontweight='bold')
plt.xlabel("Sales Amount")
plt.ylabel("Frequency")
plt.grid(axis='y', alpha=0.75)
plt.show()
✅ Interpretation:
Check if sales are normally distributed or skewed.
Identify outliers or irregular peaks.
📦 3. Box Plot — Spotting Outliers#
Box plots show median, quartiles, and outliers clearly.
plt.figure(figsize=(8, 5))
sns.boxplot(x="Region", y="Profit", data=df, palette="Set2")
plt.title("Profit Distribution by Region", fontsize=14, fontweight='bold')
plt.xlabel("Region")
plt.ylabel("Profit")
plt.show()
✅ Interpretation:
Regions with higher medians = better profit performance.
Dots outside whiskers = outliers.
⚪ 4. Scatter Plot — Relationship Between Two Variables#
Useful to analyze correlation and patterns between numeric variables.
plt.figure(figsize=(8, 6))
sns.scatterplot(x="Marketing_Spend", y="Sales", hue="Region", data=df, s=70, alpha=0.8)
plt.title("Sales vs Marketing Spend by Region", fontsize=14, fontweight='bold')
plt.xlabel("Marketing Spend")
plt.ylabel("Sales")
plt.legend(title="Region", loc="upper left")
plt.show()
✅ Interpretation:
Observe if higher marketing spend leads to higher sales.
Differentiate regional trends.
🧩 5. Pair Plot — Multiple Variable Relationships#
Pair plots show scatter plots between every numeric pair and histograms on the diagonal.
sns.pairplot(df, vars=["Sales", "Profit", "Marketing_Spend"], hue="Region", palette="husl")
plt.suptitle("Pairwise Relationships among Key Variables", y=1.02, fontsize=14, fontweight='bold')
plt.show()
✅ Interpretation:
Detect linear/nonlinear relationships.
Check for correlations or clusters by region.
🔥 6. Heatmap — Correlation Between Variables#
Heatmaps visualize how strongly numerical variables are correlated.
plt.figure(figsize=(7, 5))
corr = df[["Sales", "Profit", "Marketing_Spend", "Customer_Rating"]].corr()
sns.heatmap(corr, annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
plt.title("Correlation Heatmap", fontsize=14, fontweight='bold')
plt.show()
✅ Interpretation:
Values near +1 → strong positive correlation.
Values near -1 → strong negative correlation.
Helps detect multicollinearity.
🎨 7. Customizing Plots (Titles, Labels, Legends, Styles)#
You can globally customize appearance using plt.rcParams or sns.set_theme().
Example:
sns.set_theme(style="whitegrid", context="talk", palette="muted")
plt.figure(figsize=(8, 5))
sns.histplot(df["Profit"], bins=15, kde=True, color="teal")
plt.title("Customized Profit Distribution", fontsize=16, fontweight='bold')
plt.xlabel("Profit (in $)")
plt.ylabel("Number of Transactions")
plt.show()
Customization Options:#
Element |
Function |
|---|---|
|
Adds plot title |
|
Adds axis labels |
|
Shows/hides legend |
|
Enables grid lines |
|
Changes overall style (e.g., “ggplot”, “seaborn”) |
|
Sets color scheme in seaborn plots |
🧠 8. Key Takeaways#
Plot Type |
Purpose |
Example Use Case |
|---|---|---|
Histogram |
Distribution of single variable |
Sales amount spread |
Box Plot |
Median, spread, and outliers |
Profit by region |
Scatter Plot |
Relationship between two variables |
Sales vs marketing spend |
Pair Plot |
All pairwise relationships |
Sales, profit, and spend |
Heatmap |
Correlation matrix visualization |
Detect related metrics |
✅ 9. Practical Exercise#
Load any CSV dataset (e.g., from Kaggle or your own sales data).
Create at least one of each type of plot.
Add proper titles, labels, and legends.
Write one-line insights below each plot.
Using Plotly for interactive charts#
Combining multiple visuals into simple dashboards#
🎯 Learning Objectives#
By the end of this section, students will be able to:
Create interactive visualizations using Plotly.
Understand how to use hover, zoom, and filter features.
Combine multiple Plotly charts into a simple interactive dashboard layout.
Present data insights dynamically — ideal for business analytics and storytelling.
⚙️ 1. Setup#
We’ll use the plotly library (specifically plotly.express and plotly.graph_objects).
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
# Dummy dataset
np.random.seed(42)
df = pd.DataFrame({
"Month": pd.date_range("2024-01-01", periods=12, freq="M"),
"Sales": np.random.randint(30000, 80000, 12),
"Profit": np.random.randint(5000, 15000, 12),
"Region": np.random.choice(["North", "South", "East", "West"], 12)
})
df.head()
📈 2. Interactive Line Chart (Sales Trend Over Time)#
fig = px.line(df, x="Month", y="Sales", title="Monthly Sales Trend",
markers=True, color_discrete_sequence=["#1f77b4"])
fig.update_layout(
xaxis_title="Month",
yaxis_title="Sales (₹)",
hovermode="x unified"
)
fig.show()
✅ Highlights:
Hover over data points to see exact values.
Zoom in/out with drag.
Pan across time series easily.
Clean, publication-ready visuals with one line of code.
📊 3. Interactive Bar Chart (Profit by Region)#
fig = px.bar(df, x="Region", y="Profit", color="Region",
title="Profit by Region", text_auto=True,
color_discrete_sequence=px.colors.qualitative.Vivid)
fig.update_layout(
xaxis_title="Region",
yaxis_title="Profit (₹)",
showlegend=False
)
fig.show()
✅ Highlights:
Hover shows data for each region.
Bars are color-coded automatically.
Perfect for category comparisons.
⚪ 4. Interactive Scatter Plot (Sales vs Profit)#
fig = px.scatter(df, x="Sales", y="Profit", color="Region", size="Profit",
hover_name="Month", title="Sales vs Profit by Region",
color_discrete_sequence=px.colors.qualitative.Set2)
fig.update_layout(
xaxis_title="Sales (₹)",
yaxis_title="Profit (₹)",
legend_title="Region"
)
fig.show()
✅ Highlights:
Hover reveals Month, Sales, Profit, Region.
Circle size encodes Profit value.
Visually identifies strong-performing regions.
🧱 6. Combining Multiple Visuals into a Dashboard#
You can combine multiple Plotly figures into one interactive dashboard layout using make_subplots() from plotly.subplots.
from plotly.subplots import make_subplots
# Create subplot layout
fig = make_subplots(rows=2, cols=2,
subplot_titles=("Sales Trend", "Profit by Region", "Sales vs Profit", "Sales Share"))
# Add charts
fig.add_trace(go.Scatter(x=df["Month"], y=df["Sales"], name="Sales", mode="lines+markers"), row=1, col=1)
fig.add_trace(go.Bar(x=df["Region"], y=df["Profit"], name="Profit by Region"), row=1, col=2)
fig.add_trace(go.Scatter(x=df["Sales"], y=df["Profit"], mode="markers", name="Sales vs Profit",
marker=dict(size=10, color='teal', opacity=0.6)), row=2, col=1)
fig.add_trace(go.Pie(labels=df["Region"], values=df["Sales"], name="Sales Share"), row=2, col=2)
# Layout customization
fig.update_layout(
height=800, width=1000, title_text="Interactive Business Dashboard",
showlegend=False, template="plotly_white"
)
fig.show()
✅ Dashboard Features:
All visuals are interactive — zoom, hover, click.
Unified look with a single white theme.
Ideal for JupyterBook, JupyterLab, or Streamlit integration.
⚡ 7. Business Insights from Dashboard#
Sales Trend: Observe which months have high or low sales.
Profit by Region: Identify the most profitable regions.
Sales vs Profit: See whether sales strongly relate to profits.
Sales Share: Understand contribution of each region to total sales.
🎨 8. Enhancements (Optional)#
You can further:
Add dropdown filters using
plotly.graph_objectsandupdatemenus.Export as HTML using
fig.write_html("dashboard.html").Embed into websites or JupyterBook using:
```{raw} html <iframe src="dashboard.html" width="100%" height="800"></iframe>
🧠 9. Key Takeaways#
Concept |
Description |
|---|---|
Plotly Express |
Easiest way to create interactive charts |
Hover & Zoom |
Built-in, no extra coding |
Subplots |
Combine visuals into dashboards |
Export |
Save and embed as standalone HTML |
Ideal For |
Business dashboards, storytelling, data apps |
✅ 10. Practical Exercises#
Create an interactive line chart showing monthly revenue for multiple regions.
Add a dropdown menu to switch between “Sales” and “Profit”.
Combine 3 Plotly charts into one dashboard.
Save your dashboard as
business_dashboard.html.Embed it in your JupyterBook under a section called Interactive Insights.
# Your code here