Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Summarizing Business Data by Category, Time, and Region

Notebook Guide

Grouping and pivoting help convert transaction-level data into management-ready summaries.

Learning objectives

  • aggregate data by category

  • compare summary statistics across groups

  • reshape grouped results with pivot tables

  • interpret summaries for business decisions

import pandas as pd

sales_df = pd.DataFrame(
    {
        "region": ["West", "West", "South", "South", "East"],
        "product": ["A", "B", "A", "B", "A"],
        "revenue": [1200, 900, 1100, 950, 1050],
    }
)

region_summary = sales_df.groupby("region", as_index=False)["revenue"].sum()
pivot = sales_df.pivot_table(values="revenue", index="region", columns="product", aggfunc="sum", fill_value=0)

print("Grouped summary")
print(region_summary)
print("\nPivot table")
print(pivot)
Grouped summary
  region  revenue
0   East     1050
1  South     2050
2   West     2100

Pivot table
product     A    B
region            
East     1050    0
South    1100  950
West     1200  900

Core Explanation

groupby is useful when you want summary statistics by category. Pivot tables help reorganize those summaries into matrix form so comparisons become easier to scan.

Exercises

  1. Add a month column and summarize revenue by region and month.

  2. Compute average revenue instead of total revenue.

  3. Create a pivot table with products as rows and regions as columns.

8. Interactive Code

Expected output
{'North': 220, 'South': 130}
Expected output
220
2

9. Guided Practice

What does a groupby operation usually do?

Deletes category labelsGrouping organizes by category; it does not remove the grouping key.
Aggregates data within categoriesCorrect. Groupby summarizes values by category.
Converts every number into textThat is not the purpose of groupby.
Only sorts rows alphabeticallySorting is not the same as grouping and aggregating.

What is the total sales for `North` in the example?

100That counts only one North record.
220Correct. North sales are 100 + 120.
130That is South's total.
350That is the total across all rows.