Summarizing Business Data by Category, Time, and Region¶
Notebook Guide¶
Grouping and pivoting help convert transaction-level data into management-ready summaries.
Learning objectives¶
aggregate data by category
compare summary statistics across groups
reshape grouped results with pivot tables
interpret summaries for business decisions
import pandas as pd
sales_df = pd.DataFrame(
{
"region": ["West", "West", "South", "South", "East"],
"product": ["A", "B", "A", "B", "A"],
"revenue": [1200, 900, 1100, 950, 1050],
}
)
region_summary = sales_df.groupby("region", as_index=False)["revenue"].sum()
pivot = sales_df.pivot_table(values="revenue", index="region", columns="product", aggfunc="sum", fill_value=0)
print("Grouped summary")
print(region_summary)
print("\nPivot table")
print(pivot)Grouped summary
region revenue
0 East 1050
1 South 2050
2 West 2100
Pivot table
product A B
region
East 1050 0
South 1100 950
West 1200 900
Core Explanation¶
groupby is useful when you want summary statistics by category. Pivot tables help reorganize those summaries into matrix form so comparisons become easier to scan.
Exercises¶
Add a
monthcolumn and summarize revenue by region and month.Compute average revenue instead of total revenue.
Create a pivot table with products as rows and regions as columns.
8. Interactive Code¶
Expected output
{'North': 220, 'South': 130}Expected output
220
29. Guided Practice¶
What does a groupby operation usually do?¶
Deletes category labelsGrouping organizes by category; it does not remove the grouping key.
Aggregates data within categoriesCorrect. Groupby summarizes values by category.
Converts every number into textThat is not the purpose of groupby.
Only sorts rows alphabeticallySorting is not the same as grouping and aggregating.
What is the total sales for `North` in the example?¶
100That counts only one North record.
220Correct. North sales are 100 + 120.
130That is South's total.
350That is the total across all rows.