Merkle Trees
Prove one leaf in O(log N) without shipping the whole set.
Merkle Trees
A Merkle tree (invented by Ralph Merkle, 1979) is a binary tree whose every internal node is the hash of its two children. The top of the tree is a single hash — the Merkle root — that fingerprints the entire set of leaves.
Change any single leaf and the root changes. That's the whole point.
Analogy
Imagine a tournament bracket for a million-player chess competition. Each match produces a winner who advances; pairs of winners become the next round's matches; the whole tournament funnels down to a single champion at the top. To prove to a friend that you played in the tournament, you don't need to hand over the names of all million players — you just need the short chain of opponents you faced on your way up the bracket, all the way to the champion. A Merkle tree is exactly that bracket: the champion is the root, the original data are the first-round players, and any one player's participation can be verified by a tiny sibling-chain proof instead of shipping the whole draw.
ROOT
/ \
ab cd
/ \ / \
H(a) H(b) H(c) H(d)
| | | |
a b c d ← leaves (original data, hashed)
The superpower: O(log N) membership proofs
To prove that leaf b is in a tree of a million items, you don't ship all million. You ship the sibling chain — the sibling at each level from b up to the root.
For b in the tree above you'd ship: H(a), then cd. The verifier hashes:
step 1: H( H(a) || H(b) ) = ab
step 2: H( ab || cd ) = root
If the computed root matches the one they trust, b is in the set. Proof size: log₂(N) hashes. For a million leaves that's 20 SHA-256 digests — 640 bytes.
Where you'll see it
Git. Every commit points to a tree object; trees point to blobs. The commit SHA covers everything. Two repos with the same commit SHA have byte-identical contents.
Bitcoin. Each block header commits to a Merkle root of its transactions. Light clients (SPV) can verify that a transaction is in a block without downloading the block — they ask a full node for the Merkle path.
Certificate Transparency. Every TLS certificate issued by a public CA is logged into a Merkle-tree log. Browsers demand SCTs (signed certificate timestamps) proving a cert was logged. RFC 9162 is the spec.
IPFS, Docker layers, ZFS snapshots, Cassandra anti-entropy, AWS DynamoDB streams. Anywhere you need to prove "this blob is part of this snapshot" cheaply.
Building one
function build(leaves, H) {
let level = leaves.map(H); // hash every leaf
const tree = [level];
while (level.length > 1) {
if (level.length % 2) level.push(level.at(-1)); // pad odd count
const next = [];
for (let i = 0; i < level.length; i += 2) {
next.push(H(level[i] + level[i + 1]));
}
tree.push(next);
level = next;
}
return { root: level[0], tree };
}
A few subtleties in production trees:
- Odd-count padding. Bitcoin duplicates the last hash. Others (like Certificate Transparency) carry it up unduplicated. The choice matters for the proof format but not for security.
- Domain separation. Good implementations tag leaves and internal nodes differently (e.g. prefix a leaf with
0x00, an internal node with0x01) to prevent a "second-preimage via leaf/internal collision" attack. - Hash function choice. SHA-256, SHA-512, BLAKE3. Anything collision-resistant works.
Why it works
The chain of hashes is an inductive argument: if the root is unchanged, the leaf I provided must be the one that produced the root. The adversary would need a collision at some internal node to forge a proof — which is exactly the property the underlying hash is supposed to provide.
So: a Merkle tree is how you stretch the security of a single hash over an arbitrary-sized set, with cheap verification.