Data Visualisation

Contents

Data Visualisation#

Making Numbers Speak Louder Than Words

“If data were a novel, visualization would be the movie adaptation everyone actually watches.”

Welcome, Data Picasso! 🧑‍🎨 You’ve cleaned, wrangled, and explored your data. Now it’s time to make it shine — not with glitter, but with charts that make business people go “Ooooh!”


🧠 Why Visualization Matters#

In business, a great chart can:

  • Save you 10 slides of explanation 📊

  • Convince your boss faster than a 50-page report 💼

  • Expose hidden insights (or at least hide your messy code 😉)

💬 “Never underestimate a well-placed bar chart — it can justify your annual budget.”


🧰 Prerequisite#

If you’re new to Python plotting libraries, check out my other book: 👉 📘 Programming for Business


🖼️ The Visualization Toolkit#

Let’s meet your artistic weapons of choice:

Library

Best For

Why You’ll Love It

Matplotlib

Custom static plots

Like Excel, but on steroids

Seaborn

Quick, pretty statistical charts

Beautiful defaults, less crying

Plotly

Interactive dashboards

Drag, zoom, impress investors

Altair

Declarative visualization

Concise, elegant, and logic-based


🎬 Step 1. Setting the Stage#

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = pd.read_csv("sales_data_clean.csv")

sns.set(style="whitegrid", palette="Set2")

💬 “Always set a nice style — even your plots deserve good fashion.”


📈 Step 2. Bar Charts — The Office Favorite#

When in doubt, use a bar chart. They’re simple, effective, and universally loved by PowerPoint warriors.

sns.barplot(x='region', y='sales_amount', data=df, estimator='mean')
plt.title("Average Sales by Region")
plt.xlabel("Region")
plt.ylabel("Sales ($)")
plt.show()

🗣️ “This chart says more about your market than 5 strategy meetings.”


📊 Step 3. Line Charts — Time’s Storytellers#

Use line plots to show trends, growth, and seasonal effects.

df['month'] = pd.to_datetime(df['date']).dt.to_period('M')
monthly_sales = df.groupby('month')['sales_amount'].sum().reset_index()

sns.lineplot(x='month', y='sales_amount', data=monthly_sales, marker='o')
plt.title("Monthly Sales Trend")
plt.show()

💡 “A line going up? You’re a genius. A line going down? You’re an analyst who ‘found optimization opportunities.’”


🍩 Step 4. Pie Charts — Only for Special Occasions#

Use sparingly. One pie chart per quarter is the professional limit.

region_sales = df.groupby('region')['sales_amount'].sum()
region_sales.plot(kind='pie', autopct='%1.1f%%', startangle=90)
plt.title("Sales Share by Region")
plt.ylabel("")
plt.show()

💬 “Pie charts are like desserts — lovely in small doses, disastrous if overused.”


🔁 Step 5. Scatter Plots — The Relationship Therapist#

Show how two variables interact — like marketing spend and sales.

sns.scatterplot(x='marketing_spend', y='sales_amount', data=df)
plt.title("Marketing Spend vs Sales")
plt.show()

💬 “If there’s an upward trend — congrats, your marketing is actually doing something!”


🧩 Step 6. Heatmaps — The Data Spa#

Relax your brain and let colors show correlations.

corr = df.corr(numeric_only=True)
sns.heatmap(corr, annot=True, cmap="coolwarm", fmt=".2f")
plt.title("Correlation Heatmap")
plt.show()

💡 “Red means love, blue means distance — just like business meetings.”


🧠 Step 7. Pairplots — The Multiverse of Data#

A full relationship matrix — because who doesn’t love chaos?

sns.pairplot(df[['sales_amount', 'marketing_spend', 'profit']])
plt.show()

💬 “This is the Tinder of plots — every variable gets to date every other one.”


📦 Step 8. Business Dashboard Preview#

Turn your visuals into interactive gold:

import plotly.express as px

fig = px.bar(df, x='region', y='sales_amount', color='product_category', title="Sales by Region and Category")
fig.show()

💡 “Plotly charts are like Instagram filters — they make your data instantly more likable.”


🧪 Practice Lab — “Visualize or Vanish!”#

Use the dataset company_sales.csv and visualize:

  1. Sales by product category

  2. Sales trend over time

  3. Profit vs marketing spend

  4. Region-wise revenue contribution

  5. A correlation heatmap between numerical features

🎯 Bonus: Create one chart that could appear in your company’s annual report (without getting you fired).


🧭 Recap#

Visualization

Use Case

Library

Bar Chart

Compare categories

Seaborn / Matplotlib

Line Chart

Show trends over time

Seaborn

Scatter Plot

Relationships

Seaborn

Heatmap

Correlation matrix

Seaborn

Pie Chart

Share distribution

Matplotlib

Interactive Dash

Exploration

Plotly


🎁 Quick Pro Tips#

  • Don’t overload charts — your data deserves breathing space 🧘

  • Color with purpose — not all rainbows are business-friendly 🌈

  • Always add titles and labels 📋

  • Save your plots as .png for reports 📸

plt.savefig("monthly_sales.png", dpi=300, bbox_inches='tight')

💬 “Because nothing says ‘professional’ like a crisp 300 DPI chart.”


🔜 Next Stop#

👉 Business Dashboards You’ll learn how to stitch your visual masterpieces together — to make dashboards so sleek that executives might finally stop asking for Excel sheets.


Histograms, Box Plots, Scatter Plots, Pair Plots, Heatmaps (with customization)#


🎯 Learning Objectives#

By the end of this section, students should be able to:

  1. Create basic visualizations using matplotlib and seaborn.

  2. Understand the purpose and interpretation of each plot type.

  3. Customize visual elements — titles, labels, legends, colors, and layout.

  4. Use visualizations to gain insights into data distribution and relationships.


🧩 1. Setup and Dummy Dataset#

We’ll use a small artificial dataset that mimics a business scenario (sales, customers, and marketing).

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# For consistent plot style
sns.set(style="whitegrid")

# Dummy dataset
np.random.seed(42)
df = pd.DataFrame({
    "Sales": np.random.normal(50000, 12000, 200),
    "Profit": np.random.normal(8000, 3000, 200),
    "Marketing_Spend": np.random.normal(10000, 2500, 200),
    "Customer_Rating": np.random.uniform(1, 5, 200),
    "Region": np.random.choice(["North", "South", "East", "West"], 200)
})

df.head()

📊 2. Histogram — Understanding Data Distribution#

Histograms help visualize the distribution (spread and skewness) of numerical variables.

plt.figure(figsize=(8, 5))
plt.hist(df["Sales"], bins=20, color='skyblue', edgecolor='black')
plt.title("Distribution of Sales", fontsize=14, fontweight='bold')
plt.xlabel("Sales Amount")
plt.ylabel("Frequency")
plt.grid(axis='y', alpha=0.75)
plt.show()

Interpretation:

  • Check if sales are normally distributed or skewed.

  • Identify outliers or irregular peaks.


📦 3. Box Plot — Spotting Outliers#

Box plots show median, quartiles, and outliers clearly.

plt.figure(figsize=(8, 5))
sns.boxplot(x="Region", y="Profit", data=df, palette="Set2")
plt.title("Profit Distribution by Region", fontsize=14, fontweight='bold')
plt.xlabel("Region")
plt.ylabel("Profit")
plt.show()

Interpretation:

  • Regions with higher medians = better profit performance.

  • Dots outside whiskers = outliers.


4. Scatter Plot — Relationship Between Two Variables#

Useful to analyze correlation and patterns between numeric variables.

plt.figure(figsize=(8, 6))
sns.scatterplot(x="Marketing_Spend", y="Sales", hue="Region", data=df, s=70, alpha=0.8)
plt.title("Sales vs Marketing Spend by Region", fontsize=14, fontweight='bold')
plt.xlabel("Marketing Spend")
plt.ylabel("Sales")
plt.legend(title="Region", loc="upper left")
plt.show()

Interpretation:

  • Observe if higher marketing spend leads to higher sales.

  • Differentiate regional trends.


🧩 5. Pair Plot — Multiple Variable Relationships#

Pair plots show scatter plots between every numeric pair and histograms on the diagonal.

sns.pairplot(df, vars=["Sales", "Profit", "Marketing_Spend"], hue="Region", palette="husl")
plt.suptitle("Pairwise Relationships among Key Variables", y=1.02, fontsize=14, fontweight='bold')
plt.show()

Interpretation:

  • Detect linear/nonlinear relationships.

  • Check for correlations or clusters by region.


🔥 6. Heatmap — Correlation Between Variables#

Heatmaps visualize how strongly numerical variables are correlated.

plt.figure(figsize=(7, 5))
corr = df[["Sales", "Profit", "Marketing_Spend", "Customer_Rating"]].corr()

sns.heatmap(corr, annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
plt.title("Correlation Heatmap", fontsize=14, fontweight='bold')
plt.show()

Interpretation:

  • Values near +1 → strong positive correlation.

  • Values near -1 → strong negative correlation.

  • Helps detect multicollinearity.


🎨 7. Customizing Plots (Titles, Labels, Legends, Styles)#

You can globally customize appearance using plt.rcParams or sns.set_theme().

Example:

sns.set_theme(style="whitegrid", context="talk", palette="muted")

plt.figure(figsize=(8, 5))
sns.histplot(df["Profit"], bins=15, kde=True, color="teal")
plt.title("Customized Profit Distribution", fontsize=16, fontweight='bold')
plt.xlabel("Profit (in $)")
plt.ylabel("Number of Transactions")
plt.show()

Customization Options:#

Element

Function

plt.title()

Adds plot title

plt.xlabel(), plt.ylabel()

Adds axis labels

plt.legend()

Shows/hides legend

plt.grid()

Enables grid lines

plt.style.use()

Changes overall style (e.g., “ggplot”, “seaborn”)

palette

Sets color scheme in seaborn plots


🧠 8. Key Takeaways#

Plot Type

Purpose

Example Use Case

Histogram

Distribution of single variable

Sales amount spread

Box Plot

Median, spread, and outliers

Profit by region

Scatter Plot

Relationship between two variables

Sales vs marketing spend

Pair Plot

All pairwise relationships

Sales, profit, and spend

Heatmap

Correlation matrix visualization

Detect related metrics


9. Practical Exercise#

  1. Load any CSV dataset (e.g., from Kaggle or your own sales data).

  2. Create at least one of each type of plot.

  3. Add proper titles, labels, and legends.

  4. Write one-line insights below each plot.


Using Plotly for interactive charts#

Combining multiple visuals into simple dashboards#


🎯 Learning Objectives#

By the end of this section, students will be able to:

  1. Create interactive visualizations using Plotly.

  2. Understand how to use hover, zoom, and filter features.

  3. Combine multiple Plotly charts into a simple interactive dashboard layout.

  4. Present data insights dynamically — ideal for business analytics and storytelling.


⚙️ 1. Setup#

We’ll use the plotly library (specifically plotly.express and plotly.graph_objects).

import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np

# Dummy dataset
np.random.seed(42)
df = pd.DataFrame({
    "Month": pd.date_range("2024-01-01", periods=12, freq="M"),
    "Sales": np.random.randint(30000, 80000, 12),
    "Profit": np.random.randint(5000, 15000, 12),
    "Region": np.random.choice(["North", "South", "East", "West"], 12)
})

df.head()

📈 2. Interactive Line Chart (Sales Trend Over Time)#

fig = px.line(df, x="Month", y="Sales", title="Monthly Sales Trend",
              markers=True, color_discrete_sequence=["#1f77b4"])

fig.update_layout(
    xaxis_title="Month",
    yaxis_title="Sales (₹)",
    hovermode="x unified"
)

fig.show()

Highlights:

  • Hover over data points to see exact values.

  • Zoom in/out with drag.

  • Pan across time series easily.

  • Clean, publication-ready visuals with one line of code.


📊 3. Interactive Bar Chart (Profit by Region)#

fig = px.bar(df, x="Region", y="Profit", color="Region",
             title="Profit by Region", text_auto=True,
             color_discrete_sequence=px.colors.qualitative.Vivid)

fig.update_layout(
    xaxis_title="Region",
    yaxis_title="Profit (₹)",
    showlegend=False
)

fig.show()

Highlights:

  • Hover shows data for each region.

  • Bars are color-coded automatically.

  • Perfect for category comparisons.


4. Interactive Scatter Plot (Sales vs Profit)#

fig = px.scatter(df, x="Sales", y="Profit", color="Region", size="Profit",
                 hover_name="Month", title="Sales vs Profit by Region",
                 color_discrete_sequence=px.colors.qualitative.Set2)

fig.update_layout(
    xaxis_title="Sales (₹)",
    yaxis_title="Profit (₹)",
    legend_title="Region"
)

fig.show()

Highlights:

  • Hover reveals Month, Sales, Profit, Region.

  • Circle size encodes Profit value.

  • Visually identifies strong-performing regions.


🔥 5. Interactive Pie Chart (Sales Share by Region)#

fig = px.pie(df, values="Sales", names="Region", title="Sales Share by Region",
             color_discrete_sequence=px.colors.qualitative.Pastel)

fig.update_traces(textinfo="percent+label")
fig.show()

Highlights:

  • Hover shows values and percentage.

  • Click on slices to isolate specific regions.

  • Great for business summaries.


🧱 6. Combining Multiple Visuals into a Dashboard#

You can combine multiple Plotly figures into one interactive dashboard layout using make_subplots() from plotly.subplots.

from plotly.subplots import make_subplots

# Create subplot layout
fig = make_subplots(rows=2, cols=2,
                    subplot_titles=("Sales Trend", "Profit by Region", "Sales vs Profit", "Sales Share"))

# Add charts
fig.add_trace(go.Scatter(x=df["Month"], y=df["Sales"], name="Sales", mode="lines+markers"), row=1, col=1)
fig.add_trace(go.Bar(x=df["Region"], y=df["Profit"], name="Profit by Region"), row=1, col=2)
fig.add_trace(go.Scatter(x=df["Sales"], y=df["Profit"], mode="markers", name="Sales vs Profit",
                         marker=dict(size=10, color='teal', opacity=0.6)), row=2, col=1)
fig.add_trace(go.Pie(labels=df["Region"], values=df["Sales"], name="Sales Share"), row=2, col=2)

# Layout customization
fig.update_layout(
    height=800, width=1000, title_text="Interactive Business Dashboard",
    showlegend=False, template="plotly_white"
)

fig.show()

Dashboard Features:

  • All visuals are interactive — zoom, hover, click.

  • Unified look with a single white theme.

  • Ideal for JupyterBook, JupyterLab, or Streamlit integration.


7. Business Insights from Dashboard#

  • Sales Trend: Observe which months have high or low sales.

  • Profit by Region: Identify the most profitable regions.

  • Sales vs Profit: See whether sales strongly relate to profits.

  • Sales Share: Understand contribution of each region to total sales.


🎨 8. Enhancements (Optional)#

You can further:

  • Add dropdown filters using plotly.graph_objects and updatemenus.

  • Export as HTML using fig.write_html("dashboard.html").

  • Embed into websites or JupyterBook using:

    ```{raw} html
    <iframe src="dashboard.html" width="100%" height="800"></iframe>
    
    
    

🧠 9. Key Takeaways#

Concept

Description

Plotly Express

Easiest way to create interactive charts

Hover & Zoom

Built-in, no extra coding

Subplots

Combine visuals into dashboards

Export

Save and embed as standalone HTML

Ideal For

Business dashboards, storytelling, data apps


10. Practical Exercises#

  1. Create an interactive line chart showing monthly revenue for multiple regions.

  2. Add a dropdown menu to switch between “Sales” and “Profit”.

  3. Combine 3 Plotly charts into one dashboard.

  4. Save your dashboard as business_dashboard.html.

  5. Embed it in your JupyterBook under a section called Interactive Insights.


# Your code here