hashing · level 2

Collision Resistance

The birthday paradox, and why N bits only buys N/2 security.

150 XP

Collision Resistance

A collision is a pair of distinct inputs x ≠ y with H(x) = H(y). A hash that's "collision-resistant" makes finding one infeasible.

The playground truncates SHA-256 to 16 bits so you can watch collisions appear in a browser. Real SHA-256, at full 256-bit output, would take the age of the universe.

Analogy

Walk into a classroom of 23 students and ask whether any two share a birthday. Most people guess it's unlikely — there are 365 days, after all. In fact the chance of a match is already better than 50%, because you're comparing every possible pair, not one person against a fixed date. This is the birthday paradox, and it's why finding two people with matching birthdays (a collision) is much easier than finding someone born on a specific date you pick in advance (a preimage). Hash security follows the same maths: searching for any matching pair is quadratically cheaper than searching for a match to one target.

Three attack games — not the same problem

Game Given Find Cost (ideal N-bit hash)
Preimage digest d any x with H(x) = d 2^N
Second preimage input x different y with H(x) = H(y) 2^N
Collision nothing any pair (x, y) with H(x) = H(y), x ≠ y 2^(N/2)

Notice collision is quadratically easier. That's the birthday paradox: in a room of 23 people, there's a 50% chance two share a birthday — even though there are 365 possible birthdays. Square roots are weird.

Birthday math

The probability of finding a collision after k samples on an N-bit hash is approximately:

P(collision) ≈ 1 − e^(−k² / 2·2^N)

Setting P = 0.5 and solving for k:

k ≈ √(ln 2 · 2 · 2^N) ≈ 1.17 · √(2^N) ≈ 2^(N/2)

So for SHA-256 (N = 256), you need ~2^128 tries. For a truncated 16-bit hash, you need ~2^8 ≈ 256 tries — which your CPU can do in milliseconds.

Why this matters in practice

SHA-1 has been broken. In 2017 the SHAttered attack produced two different PDFs with the same SHA-1 hash. SHA-1 is 160-bit, so collision resistance was 80-bit. Google and CWI used ~9 quintillion SHA-1 computations — expensive, but possible.

MD5 is worse. 128-bit hash, 64-bit collision resistance. Collisions are trivial today.

SHA-256 and SHA-3 are fine. 128-bit collision resistance is practically unreachable.

Collision vs preimage: don't mix them up

  • A git commit SHA collides → two different repos could map to the same commit. (Why Git is moving to SHA-256.)
  • A password hash that takes a preimage → an attacker recovers a password from its hash. (Salt + slow hash protect against this; collision resistance is irrelevant here.)
  • A TLS certificate signature collides → an attacker forges a cert that chains to a trusted CA. (Why CAs reject SHA-1 in signatures.)

Pick the right property for the threat.