algorithms · level 10

Graphs & BFS/DFS

Nodes, edges, a queue, a stack — pick a side and walk.

200 XP

Graphs & BFS/DFS

A graph is nothing more than things (nodes) and relationships (edges). Once your data fits that shape — users and follows, rooms and doors, packages and dependencies, routers and cables — two algorithms cover almost every question you'll ever ask about it: BFS explores level-by-level, DFS plunges deep before backing up. They share the same skeleton. The only difference is one uses a queue, the other a stack.

What a graph actually is

  • Nodes (a.k.a. vertices): the entities. People, packages, intersections, web pages.
  • Edges: the connections. One edge per relationship — a follow, a dependency, a road, a link.
  • Undirected vs directed. An undirected edge {a, b} means "a and b are connected, either direction." A directed edge (a, b) means "a points to b" — a link, a prerequisite, a subscription. Directed graphs have one extra complication: you can reach b from a but not necessarily a from b.
  • Cycles. A path that returns to its start. Cycles are the reason you need a visited set — without one, traversal loops forever.
  • Weighted edges (next lesson): each edge has a cost. BFS assumes every edge costs 1; Dijkstra and friends are the weighted generalisation.

How graphs live in memory

Two canonical representations. You pick based on the density — how many edges relative to the maximum possible (n * (n - 1) / 2 for undirected).

Adjacency list Adjacency matrix
Shape adj[node] = [neighbour, …] (array of lists or dict of sets) m[i][j] = 1 if edge exists (2D array)
Space O(V + E) O(V²)
"Is there an edge a→b?" O(degree) O(1)
"Iterate all neighbours of a" O(degree) O(V)
Best for Sparse graphs (social networks, road networks) Dense graphs (small n, lots of edges; or algorithms that need O(1) edge lookup)

For almost every real-world graph you'll touch — web pages, packages, cities, users — use an adjacency list. Matrices win when you need constant-time edge queries and the graph is dense enough that the V² memory isn't wasteful.

BFS — the queue, layer by layer

function bfs(adj, start) {
  const visited = new Set([start]);
  const queue = [start];
  while (queue.length > 0) {
    const node = queue.shift();            // FIFO
    for (const next of adj[node] ?? []) {
      if (!visited.has(next)) {
        visited.add(next);                  // mark on ENQUEUE
        queue.push(next);
      }
    }
  }
}

BFS visits every node at distance 1 before anything at distance 2, everything at distance 2 before distance 3, and so on — the "ripples on a pond" shape. That's exactly why BFS finds the shortest path by edge count on unweighted graphs: the first time it reaches a node, that node is at its minimum distance from the start.

The load-bearing line is visited.add(next) before the push. The classic mistake is marking a node visited only when you dequeue it — on a dense graph, the same node can be enqueued by every one of its neighbours before the first copy is pulled off, turning an O(V + E) algorithm into O(V·E).

DFS — the stack, deep first

function dfs(adj, start) {
  const visited = new Set();
  const stack = [start];
  while (stack.length > 0) {
    const node = stack.pop();               // LIFO
    if (visited.has(node)) continue;
    visited.add(node);
    for (const next of adj[node] ?? []) {
      if (!visited.has(next)) stack.push(next);
    }
  }
}

Pop instead of shift — that's the entire structural difference. The LIFO discipline means DFS plunges along one branch until it hits a dead end, then rewinds and tries the next branch. Three things DFS is uniquely good at:

  • Topological sort. For a DAG, the reverse post-order of a DFS is a valid topological order — how build systems, package installers, and spreadsheet recalcs decide what runs first.
  • Cycle detection. If DFS ever tries to push a node that's already on the current path, there's a cycle. Detecting cycles is how npm catches dependency loops and how schedulers reject illegal task graphs.
  • Connectivity / component labelling. One DFS per unvisited node, label each component — the classic "find all islands in a grid" interview problem.

Recursive DFS is a two-line function but blows the call stack on deep graphs (Node/JS defaults to ~10–15k frames, Python ~1k). Use the iterative stack form whenever depth is unbounded.

Common mistakes

  • Marking visited on dequeue, not enqueue. Turns BFS into quadratic work on dense graphs — every neighbour re-enqueues every other neighbour before the first one is processed.
  • Confusing "seen" with "processed." Some algorithms need both: "I've queued it" vs "I've fully explored its subtree." Keep the two sets distinct when the algorithm demands it (cycle detection in a DAG, for instance).
  • Using recursive DFS on deep graphs. Stack overflow is a real, reproducible bug on anything with a long chain. Iterative-with-an-explicit-stack is the safe default.
  • Forgetting the visited set entirely on a cyclic graph. Instant infinite loop. The if (!visited.has(next)) guard isn't a micro-optimisation; it's the correctness condition.
  • Using a linked-list-shaped "queue" in JavaScript. [].shift() is O(n). On a million-node BFS that's a hidden V² cost. Swap in a deque library or a two-stack trick if performance matters.

Real systems that lean on BFS/DFS

  • Web crawlers. BFS from a seed URL; visit each page once; add its outlinks to the frontier. The visited set is essentially the content-hash index.
  • Social graph reach. "Who's within 2 hops of me?" is one BFS from your profile, pruned at depth 2.
  • Build systems (Bazel, Make, Nx). Topological sort — DFS post-order — decides what rebuilds and in what order.
  • Package managers (npm, pip). DFS walks the dependency graph; cycle detection refuses to install on a loop.
  • Router networks. OSPF / IS-IS flood link-state updates with BFS-shaped propagation; RIP uses distance vectors but the reachability structure is the same.
  • Maze solvers / game AI. BFS for shortest unweighted path; DFS for "can I get there at all?" when you don't need optimality.

Practical gotchas

  • Memory blow-up on high-branching-factor BFS. The frontier can grow exponentially. A BFS on a 10-connected graph with 1M reachable nodes can queue ~100k nodes at once. Budget memory accordingly; consider iterative deepening (DFS bounded by depth, growing the limit) when the branching factor is large.
  • Directed vs undirected in your adjacency list. For undirected graphs, remember to add the edge to both endpoints. Forgetting half the edges produces a graph that looks right on paper but traverses like a one-way street.
  • BFS on a weighted graph returns nonsense. If the edges have different costs, BFS ignores them — use Dijkstra instead. (BFS is correct only for unit-weight edges.)
  • DFS is not deterministic unless you sort neighbours. Different adjacency-list orderings produce different visit orders. The playground and the daily challenge both sort neighbours alphabetically for this reason.

Playground

Pick a graph (3×3 grid, small tree, or graph with a cycle), pick a start node, pick BFS or DFS, and step through. The visited set fills in as you go. The frontier shows the queue (BFS) or stack (DFS) state. At each step, the current node is highlighted — compare the order BFS visits against the order DFS does on the same graph.

The Shared Level Build Recipe convention makes this a pure playhead over the frame list returned by traceBFS / traceDFS: if it walks correctly in the lib's unit tests, it walks correctly here.

Visualizer

The Graph Walker panel draws the graph as an SVG and animates the frame cursor — nodes fade in from unvisited (neutral) through frontier (accent) to visited (success), with a ring on the currently-visiting node. Watch the same graph with BFS and DFS side by side: BFS paints level-by-level, DFS carves a single spine down to a leaf before backing up.