Combining Data from Multiple Business Tables Reliably¶
Notebook Guide¶
Many real analyses require combining data from separate files or systems.
Learning objectives¶
understand key-based merges
compare inner and left joins
use concatenation for stacking compatible tables
spot missing matches and duplication risks
import pandas as pd
customers = pd.DataFrame({"customer_id": [1, 2, 3], "name": ["Asha", "Rahul", "Mina"]})
orders = pd.DataFrame({"customer_id": [1, 1, 3], "order_value": [250, 400, 320]})
merged = customers.merge(orders, on="customer_id", how="left")
extra_orders = pd.DataFrame({"customer_id": [4], "order_value": [150]})
stacked_orders = pd.concat([orders, extra_orders], ignore_index=True)
print("Merged table")
print(merged)
print("\nConcatenated orders")
print(stacked_orders)Merged table
customer_id name order_value
0 1 Asha 250.0
1 1 Asha 400.0
2 2 Rahul NaN
3 3 Mina 320.0
Concatenated orders
customer_id order_value
0 1 250
1 1 400
2 3 320
3 4 150
Core Explanation¶
Merging uses shared keys to align related records. Concatenation stacks compatible tables together. The main analytical risk is assuming identifiers match cleanly when they do not.
Exercises¶
Change the merge to an inner join and compare the output.
Add a customer with no orders and inspect the left-join result.
Introduce duplicate keys and study how row counts change.
8. Interactive Code¶
Expected output
[{'id': 1, 'name': 'Asha', 'amount': 500}, {'id': 2, 'name': 'Rahul', 'amount': 700}]Expected output
Rahul
29. Guided Practice¶
Why do we merge or join datasets?¶
To separate related information permanentlyMerging combines related information.
To combine related fields from different sourcesCorrect. Joins connect matching records.
To remove keys from all tablesKeys are often what make the merge possible.
To avoid using records entirelyMerging still works on records.
How many merged rows are produced in the example?¶
1There are two paired entries.
2Correct. Two customer-order pairs are merged.
3Only two rows are shown.
4That is too many for the example.