Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Topological Sorting

Ordering Tasks with Dependencies

Cycle detection answered a yes-or-no question: does a dependency loop exist? Topological sorting goes one step further. If the graph is acyclic, it builds a valid order so every prerequisite appears before the task that depends on it.

Why this matters in practice
  • a data workflow must load data before cleaning it and clean it before training

  • a software build must compile dependencies before packaging the final app

  • a business approval chain must finish review steps before release or deployment

from collections import deque

workflow = {
    "Plan Project": ["Collect Data", "Set Up Infrastructure"],
    "Collect Data": ["Clean Data"],
    "Clean Data": ["Engineer Features"],
    "Set Up Infrastructure": ["Train Model"],
    "Engineer Features": ["Train Model"],
    "Train Model": ["Review Metrics"],
    "Review Metrics": ["Approve Release"],
    "Approve Release": ["Deploy Model"],
    "Deploy Model": [],
}

def topological_sort(graph):
    indegree = {node: 0 for node in graph}
    for node in graph:
        for neighbor in graph[node]:
            indegree[neighbor] += 1

    queue = deque(sorted(node for node, degree in indegree.items() if degree == 0))
    order = []

    while queue:
        node = queue.popleft()
        order.append(node)

        for neighbor in graph[node]:
            indegree[neighbor] -= 1
            if indegree[neighbor] == 0:
                queue.append(neighbor)

    if len(order) != len(graph):
        raise ValueError("Topological order is impossible because the graph contains a cycle.")

    return order

valid_order = topological_sort(workflow)
print("One valid task order:")
for step_number, step in enumerate(valid_order, start=1):
    print(f"{step_number}. {step}")
One valid task order:
1. Plan Project
2. Collect Data
3. Set Up Infrastructure
4. Clean Data
5. Engineer Features
6. Train Model
7. Review Metrics
8. Approve Release
9. Deploy Model

Core Idea and Visual Intuition

Topological sorting applies to a directed acyclic graph. Each arrow means “must happen before.” A valid topological order respects every arrow in the graph.

Read the graph visually
  • Plan Project comes first because nothing must happen before it

  • Train Model has to wait for both Set Up Infrastructure and Engineer Features

  • Deploy Model appears late because it depends on the whole chain before it

Indegree means
Why cycles break ordering
Link from the previous notebook

The indegree of a node is how many prerequisites still point into it. If indegree is zero, that task is ready to begin.

Worked Example and Practice Lab

What to notice
  • only zero-indegree tasks can appear next in a valid order

  • at the beginning, Plan Project is available because nothing points into it

  • after removing a task, some downstream tasks may become newly available

Expected contrast
Valid DAG order: ['Plan Project', 'Collect Data', 'Set Up Infrastructure', 'Clean Data', 'Engineer Features', 'Train Model', 'Review Metrics', 'Approve Release', 'Deploy Model']
Cycle found: no topological order exists.

Guided Practice

When can a node appear next in a topological order?

When it has the most outgoing edgesOutgoing edge count does not make a task immediately available.
When its indegree is zeroCorrect. Zero indegree means no unfinished prerequisites remain.
When it is alphabetically firstAlphabetical order is not the rule here.
When it points to the deployment taskThat does not determine readiness.

Why does a cycle prevent topological sorting?

Because cycles make the graph undirectedA cycle does not change the graph into an undirected one.
Because no valid order can place every prerequisite before the tasks that depend on itCorrect. The dependency loop blocks a valid start-to-finish order.
Because cycles remove all edgesEdges remain; they are the source of the conflict.
Because topological sorting only works on treesIt works on DAGs, not just trees.

Which statement is true about topological order?

There is always exactly one correct answerDifferent zero-indegree choices can produce different valid orders.
Multiple valid orders may exist as long as all dependencies are respectedCorrect. Dependency constraints matter more than a single fixed sequence.
The order ignores prerequisite edges after the first stepDependencies must be respected throughout the whole sequence.
The algorithm is only useful for weighted graphsWeights are not required here.

Exercises

  1. Change the workflow so Review Metrics also depends on Clean Data. Predict whether the displayed order still works.

  2. Add a new task called Write Report that must happen after Review Metrics but before Approve Release.

  3. Modify the code so it collects all zero-indegree tasks at each round and prints them as processing layers.

Hint

Think in terms of prerequisites still remaining. A task becomes available only after every incoming dependency has been removed from consideration.

Key Takeaway

Topological sorting turns an acyclic dependency graph into a valid execution order by repeatedly choosing zero-indegree tasks. Cycle detection told us when ordering is impossible; topological sorting shows how to proceed when the graph is safe. In the next notebook, the focus shifts again from prerequisite order to route exploration in the path-finder visualizer.