Collision Resistance
The birthday paradox, and why N bits only buys N/2 security.
Collision Resistance
A collision is a pair of distinct inputs x ≠ y with H(x) = H(y). A hash that's "collision-resistant" makes finding one infeasible.
The playground truncates SHA-256 to 16 bits so you can watch collisions appear in a browser. Real SHA-256, at full 256-bit output, would take the age of the universe.
Analogy
Walk into a classroom of 23 students and ask whether any two share a birthday. Most people guess it's unlikely — there are 365 days, after all. In fact the chance of a match is already better than 50%, because you're comparing every possible pair, not one person against a fixed date. This is the birthday paradox, and it's why finding two people with matching birthdays (a collision) is much easier than finding someone born on a specific date you pick in advance (a preimage). Hash security follows the same maths: searching for any matching pair is quadratically cheaper than searching for a match to one target.
Three attack games — not the same problem
| Game | Given | Find | Cost (ideal N-bit hash) |
|---|---|---|---|
| Preimage | digest d |
any x with H(x) = d |
2^N |
| Second preimage | input x |
different y with H(x) = H(y) |
2^N |
| Collision | nothing | any pair (x, y) with H(x) = H(y), x ≠ y |
2^(N/2) |
Notice collision is quadratically easier. That's the birthday paradox: in a room of 23 people, there's a 50% chance two share a birthday — even though there are 365 possible birthdays. Square roots are weird.
Birthday math
The probability of finding a collision after k samples on an N-bit hash is approximately:
P(collision) ≈ 1 − e^(−k² / 2·2^N)
Setting P = 0.5 and solving for k:
k ≈ √(ln 2 · 2 · 2^N) ≈ 1.17 · √(2^N) ≈ 2^(N/2)
So for SHA-256 (N = 256), you need ~2^128 tries. For a truncated 16-bit hash, you need ~2^8 ≈ 256 tries — which your CPU can do in milliseconds.
Why this matters in practice
SHA-1 has been broken. In 2017 the SHAttered attack produced two different PDFs with the same SHA-1 hash. SHA-1 is 160-bit, so collision resistance was 80-bit. Google and CWI used ~9 quintillion SHA-1 computations — expensive, but possible.
MD5 is worse. 128-bit hash, 64-bit collision resistance. Collisions are trivial today.
SHA-256 and SHA-3 are fine. 128-bit collision resistance is practically unreachable.
Collision vs preimage: don't mix them up
- A git commit SHA collides → two different repos could map to the same commit. (Why Git is moving to SHA-256.)
- A password hash that takes a preimage → an attacker recovers a password from its hash. (Salt + slow hash protect against this; collision resistance is irrelevant here.)
- A TLS certificate signature collides → an attacker forges a cert that chains to a trusted CA. (Why CAs reject SHA-1 in signatures.)
Pick the right property for the threat.