Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Combining Data from Multiple Business Tables Reliably

Notebook Guide

Many real analyses require combining data from separate files or systems.

Learning objectives

  • understand key-based merges

  • compare inner and left joins

  • use concatenation for stacking compatible tables

  • spot missing matches and duplication risks

import pandas as pd

customers = pd.DataFrame({"customer_id": [1, 2, 3], "name": ["Asha", "Rahul", "Mina"]})
orders = pd.DataFrame({"customer_id": [1, 1, 3], "order_value": [250, 400, 320]})

merged = customers.merge(orders, on="customer_id", how="left")
extra_orders = pd.DataFrame({"customer_id": [4], "order_value": [150]})
stacked_orders = pd.concat([orders, extra_orders], ignore_index=True)

print("Merged table")
print(merged)
print("\nConcatenated orders")
print(stacked_orders)
Merged table
   customer_id   name  order_value
0            1   Asha        250.0
1            1   Asha        400.0
2            2  Rahul          NaN
3            3   Mina        320.0

Concatenated orders
   customer_id  order_value
0            1          250
1            1          400
2            3          320
3            4          150

Core Explanation

Merging uses shared keys to align related records. Concatenation stacks compatible tables together. The main analytical risk is assuming identifiers match cleanly when they do not.

Exercises

  1. Change the merge to an inner join and compare the output.

  2. Add a customer with no orders and inspect the left-join result.

  3. Introduce duplicate keys and study how row counts change.

8. Interactive Code

Expected output
[{'id': 1, 'name': 'Asha', 'amount': 500}, {'id': 2, 'name': 'Rahul', 'amount': 700}]
Expected output
Rahul
2

9. Guided Practice

Why do we merge or join datasets?

To separate related information permanentlyMerging combines related information.
To combine related fields from different sourcesCorrect. Joins connect matching records.
To remove keys from all tablesKeys are often what make the merge possible.
To avoid using records entirelyMerging still works on records.

How many merged rows are produced in the example?

1There are two paired entries.
2Correct. Two customer-order pairs are merged.
3Only two rows are shown.
4That is too many for the example.