# Data Visualisation


*Making Numbers Speak Louder Than Words*

> ‚ÄúIf data were a novel, visualization would be the movie adaptation everyone actually watches.‚Äù

Welcome, Data Picasso! üßë‚Äçüé®
You‚Äôve cleaned, wrangled, and explored your data.
Now it‚Äôs time to make it shine ‚Äî not with glitter, but with **charts that make business people go ‚ÄúOoooh!‚Äù**

---

## üß† Why Visualization Matters

In business, a great chart can:

* Save you 10 slides of explanation üìä
* Convince your boss faster than a 50-page report üíº
* Expose hidden insights (or at least hide your messy code üòâ)

> üí¨ ‚ÄúNever underestimate a well-placed bar chart ‚Äî it can justify your annual budget.‚Äù

---

## üß∞ Prerequisite

If you‚Äôre new to Python plotting libraries, check out my other book:
üëâ **[üìò Programming for Business](https://chandraveshchaudhari.github.io/Programming_for_Business/intro.html#)**

---

## üñºÔ∏è The Visualization Toolkit

Let‚Äôs meet your artistic weapons of choice:

| Library        | Best For                         | Why You‚Äôll Love It                |
| -------------- | -------------------------------- | --------------------------------- |
| **Matplotlib** | Custom static plots              | Like Excel, but on steroids       |
| **Seaborn**    | Quick, pretty statistical charts | Beautiful defaults, less crying   |
| **Plotly**     | Interactive dashboards           | Drag, zoom, impress investors     |
| **Altair**     | Declarative visualization        | Concise, elegant, and logic-based |

---

## üé¨ Step 1. Setting the Stage

```python
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = pd.read_csv("sales_data_clean.csv")

sns.set(style="whitegrid", palette="Set2")
```

> üí¨ ‚ÄúAlways set a nice style ‚Äî even your plots deserve good fashion.‚Äù

---

## üìà Step 2. Bar Charts ‚Äî The Office Favorite

When in doubt, use a bar chart.
They‚Äôre simple, effective, and universally loved by PowerPoint warriors.

```python
sns.barplot(x='region', y='sales_amount', data=df, estimator='mean')
plt.title("Average Sales by Region")
plt.xlabel("Region")
plt.ylabel("Sales ($)")
plt.show()
```

> üó£Ô∏è ‚ÄúThis chart says more about your market than 5 strategy meetings.‚Äù

---

## üìä Step 3. Line Charts ‚Äî Time‚Äôs Storytellers

Use line plots to show trends, growth, and seasonal effects.

```python
df['month'] = pd.to_datetime(df['date']).dt.to_period('M')
monthly_sales = df.groupby('month')['sales_amount'].sum().reset_index()

sns.lineplot(x='month', y='sales_amount', data=monthly_sales, marker='o')
plt.title("Monthly Sales Trend")
plt.show()
```

> üí° ‚ÄúA line going up? You‚Äôre a genius.
> A line going down? You‚Äôre an analyst who ‚Äòfound optimization opportunities.‚Äô‚Äù

---

## üç© Step 4. Pie Charts ‚Äî Only for Special Occasions

Use sparingly. One pie chart per quarter is the professional limit.

```python
region_sales = df.groupby('region')['sales_amount'].sum()
region_sales.plot(kind='pie', autopct='%1.1f%%', startangle=90)
plt.title("Sales Share by Region")
plt.ylabel("")
plt.show()
```

> üí¨ ‚ÄúPie charts are like desserts ‚Äî lovely in small doses, disastrous if overused.‚Äù

---

## üîÅ Step 5. Scatter Plots ‚Äî The Relationship Therapist

Show how two variables interact ‚Äî like marketing spend and sales.

```python
sns.scatterplot(x='marketing_spend', y='sales_amount', data=df)
plt.title("Marketing Spend vs Sales")
plt.show()
```

> üí¨ ‚ÄúIf there‚Äôs an upward trend ‚Äî congrats, your marketing is actually doing something!‚Äù

---

## üß© Step 6. Heatmaps ‚Äî The Data Spa

Relax your brain and let colors show correlations.

```python
corr = df.corr(numeric_only=True)
sns.heatmap(corr, annot=True, cmap="coolwarm", fmt=".2f")
plt.title("Correlation Heatmap")
plt.show()
```

> üí° ‚ÄúRed means love, blue means distance ‚Äî just like business meetings.‚Äù

---

## üß† Step 7. Pairplots ‚Äî The Multiverse of Data

A full relationship matrix ‚Äî because who doesn‚Äôt love chaos?

```python
sns.pairplot(df[['sales_amount', 'marketing_spend', 'profit']])
plt.show()
```

> üí¨ ‚ÄúThis is the Tinder of plots ‚Äî every variable gets to date every other one.‚Äù

---

## üì¶ Step 8. Business Dashboard Preview

Turn your visuals into interactive gold:

```python
import plotly.express as px

fig = px.bar(df, x='region', y='sales_amount', color='product_category', title="Sales by Region and Category")
fig.show()
```

> üí° ‚ÄúPlotly charts are like Instagram filters ‚Äî they make your data instantly more likable.‚Äù

---

## üß™ Practice Lab ‚Äî ‚ÄúVisualize or Vanish!‚Äù

Use the dataset `company_sales.csv` and visualize:

1. Sales by product category
2. Sales trend over time
3. Profit vs marketing spend
4. Region-wise revenue contribution
5. A correlation heatmap between numerical features

üéØ **Bonus:** Create one chart that could appear in your company‚Äôs annual report (without getting you fired).

---

## üß≠ Recap

| Visualization    | Use Case              | Library              |
| ---------------- | --------------------- | -------------------- |
| Bar Chart        | Compare categories    | Seaborn / Matplotlib |
| Line Chart       | Show trends over time | Seaborn              |
| Scatter Plot     | Relationships         | Seaborn              |
| Heatmap          | Correlation matrix    | Seaborn              |
| Pie Chart        | Share distribution    | Matplotlib           |
| Interactive Dash | Exploration           | Plotly               |

---

## üéÅ Quick Pro Tips

* Don‚Äôt overload charts ‚Äî your data deserves breathing space üßò
* Color with purpose ‚Äî not all rainbows are business-friendly üåà
* Always add titles and labels üìã
* Save your plots as `.png` for reports üì∏

```python
plt.savefig("monthly_sales.png", dpi=300, bbox_inches='tight')
```

> üí¨ ‚ÄúBecause nothing says ‚Äòprofessional‚Äô like a crisp 300 DPI chart.‚Äù

---

## üîú Next Stop

üëâ **[Business Dashboards](business_dashboards)**
You‚Äôll learn how to stitch your visual masterpieces together ‚Äî
to make dashboards so sleek that executives might finally stop asking for Excel sheets.

---



### *Histograms, Box Plots, Scatter Plots, Pair Plots, Heatmaps (with customization)*

---

## üéØ **Learning Objectives**

By the end of this section, students should be able to:

1. Create basic visualizations using **matplotlib** and **seaborn**.
2. Understand the purpose and interpretation of each plot type.
3. Customize visual elements ‚Äî **titles, labels, legends, colors, and layout**.
4. Use visualizations to gain **insights** into data distribution and relationships.

---

## üß© **1. Setup and Dummy Dataset**

We‚Äôll use a small artificial dataset that mimics a business scenario (sales, customers, and marketing).

```python
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# For consistent plot style
sns.set(style="whitegrid")

# Dummy dataset
np.random.seed(42)
df = pd.DataFrame({
    "Sales": np.random.normal(50000, 12000, 200),
    "Profit": np.random.normal(8000, 3000, 200),
    "Marketing_Spend": np.random.normal(10000, 2500, 200),
    "Customer_Rating": np.random.uniform(1, 5, 200),
    "Region": np.random.choice(["North", "South", "East", "West"], 200)
})

df.head()
```

---

## üìä **2. Histogram ‚Äî Understanding Data Distribution**

Histograms help visualize the distribution (spread and skewness) of numerical variables.

```python
plt.figure(figsize=(8, 5))
plt.hist(df["Sales"], bins=20, color='skyblue', edgecolor='black')
plt.title("Distribution of Sales", fontsize=14, fontweight='bold')
plt.xlabel("Sales Amount")
plt.ylabel("Frequency")
plt.grid(axis='y', alpha=0.75)
plt.show()
```

‚úÖ **Interpretation:**

* Check if sales are normally distributed or skewed.
* Identify outliers or irregular peaks.

---

## üì¶ **3. Box Plot ‚Äî Spotting Outliers**

Box plots show **median**, **quartiles**, and **outliers** clearly.

```python
plt.figure(figsize=(8, 5))
sns.boxplot(x="Region", y="Profit", data=df, palette="Set2")
plt.title("Profit Distribution by Region", fontsize=14, fontweight='bold')
plt.xlabel("Region")
plt.ylabel("Profit")
plt.show()
```

‚úÖ **Interpretation:**

* Regions with higher medians = better profit performance.
* Dots outside whiskers = outliers.

---

## ‚ö™ **4. Scatter Plot ‚Äî Relationship Between Two Variables**

Useful to analyze correlation and patterns between numeric variables.

```python
plt.figure(figsize=(8, 6))
sns.scatterplot(x="Marketing_Spend", y="Sales", hue="Region", data=df, s=70, alpha=0.8)
plt.title("Sales vs Marketing Spend by Region", fontsize=14, fontweight='bold')
plt.xlabel("Marketing Spend")
plt.ylabel("Sales")
plt.legend(title="Region", loc="upper left")
plt.show()
```

‚úÖ **Interpretation:**

* Observe if higher marketing spend leads to higher sales.
* Differentiate regional trends.

---

## üß© **5. Pair Plot ‚Äî Multiple Variable Relationships**

Pair plots show scatter plots between every numeric pair and histograms on the diagonal.

```python
sns.pairplot(df, vars=["Sales", "Profit", "Marketing_Spend"], hue="Region", palette="husl")
plt.suptitle("Pairwise Relationships among Key Variables", y=1.02, fontsize=14, fontweight='bold')
plt.show()
```

‚úÖ **Interpretation:**

* Detect linear/nonlinear relationships.
* Check for correlations or clusters by region.

---

## üî• **6. Heatmap ‚Äî Correlation Between Variables**

Heatmaps visualize how strongly numerical variables are correlated.

```python
plt.figure(figsize=(7, 5))
corr = df[["Sales", "Profit", "Marketing_Spend", "Customer_Rating"]].corr()

sns.heatmap(corr, annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
plt.title("Correlation Heatmap", fontsize=14, fontweight='bold')
plt.show()
```

‚úÖ **Interpretation:**

* Values near +1 ‚Üí strong positive correlation.
* Values near -1 ‚Üí strong negative correlation.
* Helps detect multicollinearity.

---

## üé® **7. Customizing Plots (Titles, Labels, Legends, Styles)**

You can globally customize appearance using `plt.rcParams` or `sns.set_theme()`.

Example:

```python
sns.set_theme(style="whitegrid", context="talk", palette="muted")

plt.figure(figsize=(8, 5))
sns.histplot(df["Profit"], bins=15, kde=True, color="teal")
plt.title("Customized Profit Distribution", fontsize=16, fontweight='bold')
plt.xlabel("Profit (in $)")
plt.ylabel("Number of Transactions")
plt.show()
```

### Customization Options:

| Element                        | Function                                          |
| ------------------------------ | ------------------------------------------------- |
| `plt.title()`                  | Adds plot title                                   |
| `plt.xlabel()`, `plt.ylabel()` | Adds axis labels                                  |
| `plt.legend()`                 | Shows/hides legend                                |
| `plt.grid()`                   | Enables grid lines                                |
| `plt.style.use()`              | Changes overall style (e.g., ‚Äúggplot‚Äù, ‚Äúseaborn‚Äù) |
| `palette`                      | Sets color scheme in seaborn plots                |

---

## üß† **8. Key Takeaways**

| Plot Type        | Purpose                            | Example Use Case         |
| ---------------- | ---------------------------------- | ------------------------ |
| **Histogram**    | Distribution of single variable    | Sales amount spread      |
| **Box Plot**     | Median, spread, and outliers       | Profit by region         |
| **Scatter Plot** | Relationship between two variables | Sales vs marketing spend |
| **Pair Plot**    | All pairwise relationships         | Sales, profit, and spend |
| **Heatmap**      | Correlation matrix visualization   | Detect related metrics   |

---

## ‚úÖ **9. Practical Exercise**

1. Load any CSV dataset (e.g., from Kaggle or your own sales data).
2. Create at least one of each type of plot.
3. Add proper titles, labels, and legends.
4. Write one-line insights below each plot.

---



### Using Plotly for interactive charts

### Combining multiple visuals into simple dashboards

---

## üéØ **Learning Objectives**

By the end of this section, students will be able to:

1. Create **interactive visualizations** using Plotly.
2. Understand how to use **hover**, **zoom**, and **filter** features.
3. Combine multiple Plotly charts into a **simple interactive dashboard** layout.
4. Present data insights dynamically ‚Äî ideal for business analytics and storytelling.

---

## ‚öôÔ∏è **1. Setup**

We‚Äôll use the `plotly` library (specifically `plotly.express` and `plotly.graph_objects`).

```python
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np

# Dummy dataset
np.random.seed(42)
df = pd.DataFrame({
    "Month": pd.date_range("2024-01-01", periods=12, freq="M"),
    "Sales": np.random.randint(30000, 80000, 12),
    "Profit": np.random.randint(5000, 15000, 12),
    "Region": np.random.choice(["North", "South", "East", "West"], 12)
})

df.head()
```

---

## üìà **2. Interactive Line Chart (Sales Trend Over Time)**

```python
fig = px.line(df, x="Month", y="Sales", title="Monthly Sales Trend",
              markers=True, color_discrete_sequence=["#1f77b4"])

fig.update_layout(
    xaxis_title="Month",
    yaxis_title="Sales (‚Çπ)",
    hovermode="x unified"
)

fig.show()
```

‚úÖ **Highlights:**

* Hover over data points to see exact values.
* Zoom in/out with drag.
* Pan across time series easily.
* Clean, publication-ready visuals with one line of code.

---

## üìä **3. Interactive Bar Chart (Profit by Region)**

```python
fig = px.bar(df, x="Region", y="Profit", color="Region",
             title="Profit by Region", text_auto=True,
             color_discrete_sequence=px.colors.qualitative.Vivid)

fig.update_layout(
    xaxis_title="Region",
    yaxis_title="Profit (‚Çπ)",
    showlegend=False
)

fig.show()
```

‚úÖ **Highlights:**

* Hover shows data for each region.
* Bars are color-coded automatically.
* Perfect for category comparisons.

---

## ‚ö™ **4. Interactive Scatter Plot (Sales vs Profit)**

```python
fig = px.scatter(df, x="Sales", y="Profit", color="Region", size="Profit",
                 hover_name="Month", title="Sales vs Profit by Region",
                 color_discrete_sequence=px.colors.qualitative.Set2)

fig.update_layout(
    xaxis_title="Sales (‚Çπ)",
    yaxis_title="Profit (‚Çπ)",
    legend_title="Region"
)

fig.show()
```

‚úÖ **Highlights:**

* Hover reveals Month, Sales, Profit, Region.
* Circle size encodes Profit value.
* Visually identifies strong-performing regions.

---

## üî• **5. Interactive Pie Chart (Sales Share by Region)**

```python
fig = px.pie(df, values="Sales", names="Region", title="Sales Share by Region",
             color_discrete_sequence=px.colors.qualitative.Pastel)

fig.update_traces(textinfo="percent+label")
fig.show()
```

‚úÖ **Highlights:**

* Hover shows values and percentage.
* Click on slices to isolate specific regions.
* Great for business summaries.

---

## üß± **6. Combining Multiple Visuals into a Dashboard**

You can combine multiple Plotly figures into one **interactive dashboard layout** using `make_subplots()` from `plotly.subplots`.

```python
from plotly.subplots import make_subplots

# Create subplot layout
fig = make_subplots(rows=2, cols=2,
                    subplot_titles=("Sales Trend", "Profit by Region", "Sales vs Profit", "Sales Share"))

# Add charts
fig.add_trace(go.Scatter(x=df["Month"], y=df["Sales"], name="Sales", mode="lines+markers"), row=1, col=1)
fig.add_trace(go.Bar(x=df["Region"], y=df["Profit"], name="Profit by Region"), row=1, col=2)
fig.add_trace(go.Scatter(x=df["Sales"], y=df["Profit"], mode="markers", name="Sales vs Profit",
                         marker=dict(size=10, color='teal', opacity=0.6)), row=2, col=1)
fig.add_trace(go.Pie(labels=df["Region"], values=df["Sales"], name="Sales Share"), row=2, col=2)

# Layout customization
fig.update_layout(
    height=800, width=1000, title_text="Interactive Business Dashboard",
    showlegend=False, template="plotly_white"
)

fig.show()
```

‚úÖ **Dashboard Features:**

* All visuals are interactive ‚Äî zoom, hover, click.
* Unified look with a single white theme.
* Ideal for JupyterBook, JupyterLab, or Streamlit integration.

---

## ‚ö° **7. Business Insights from Dashboard**

* **Sales Trend:** Observe which months have high or low sales.
* **Profit by Region:** Identify the most profitable regions.
* **Sales vs Profit:** See whether sales strongly relate to profits.
* **Sales Share:** Understand contribution of each region to total sales.

---

## üé® **8. Enhancements (Optional)**

You can further:

* Add dropdown filters using `plotly.graph_objects` and `updatemenus`.
* Export as HTML using `fig.write_html("dashboard.html")`.
* Embed into websites or JupyterBook using:

  ````markdown
  ```{raw} html
  <iframe src="dashboard.html" width="100%" height="800"></iframe>
  ````

  ```
  ```

---

## üß† **9. Key Takeaways**

| Concept            | Description                                  |
| ------------------ | -------------------------------------------- |
| **Plotly Express** | Easiest way to create interactive charts     |
| **Hover & Zoom**   | Built-in, no extra coding                    |
| **Subplots**       | Combine visuals into dashboards              |
| **Export**         | Save and embed as standalone HTML            |
| **Ideal For**      | Business dashboards, storytelling, data apps |

---

## ‚úÖ **10. Practical Exercises**

1. Create an **interactive line chart** showing monthly revenue for multiple regions.
2. Add a **dropdown menu** to switch between ‚ÄúSales‚Äù and ‚ÄúProfit‚Äù.
3. Combine 3 Plotly charts into one dashboard.
4. Save your dashboard as `business_dashboard.html`.
5. Embed it in your JupyterBook under a section called **Interactive Insights**.

---

In [None]:
# Your code here