Sponge Construction (SHA-3)
Absorb, squeeze, and why SHA-3 doesn't have SHA-2's length-extension flaw.
Sponge Construction (SHA-3)
SHA-2 (the SHA-256/384/512 family) is built on the Merkle-Damgård construction: a fixed compression function chained over message blocks. It works, it's been the workhorse of the internet for two decades, and it has one famous wart: length-extension. SHA-3, ratified in 2015, deliberately chose a different shape — the sponge — partly to insure against unknown attacks on Merkle-Damgård, partly because the sponge fixes length-extension by design.
The sponge mental model is the cleanest in cryptography. Once you see "absorb, then squeeze," you stop being puzzled by why SHA-3 looks structurally different from everything before it.
What's wrong with Merkle-Damgård
Merkle-Damgård chains a compression function over fixed-size blocks:
IV → [c] → H(M_1) → [c] → H(M_1, M_2) → [c] → H(M_1, M_2, M_3) = H(M)
↑ ↑ ↑
M_1 M_2 M_3
The problem: the final intermediate state IS the output digest. Anyone with the digest of M can plug it back in as the IV, append M', and produce H(M || padding || M') without ever seeing M. This is the length-extension attack, and it broke many H(secret || message) MAC constructions in the wild — which is why HMAC was invented.
SHA-256 has length-extension. SHA-512 has it. SHA-1 had it. MD5 had it. Anything Merkle-Damgård-shaped does.
The sponge
Imagine a state of r + c bits, divided into:
- Rate (
rbits) — the part that interacts with input/output. - Capacity (
cbits) — the part that stays internal.
state = [ rate r bits | capacity c bits ]
↑ ↑
touched by I/O never directly emitted
The hash function applies a permutation f to the whole state. The permutation is fast, public, and a one-way mixing operation (the bigger and randomer the permutation, the more secure the sponge).
Two phases:
Absorb
For each input block M_i (size r):
state[0..r] ⊕= M_i
state = f(state)
Each block is XORed into the rate. Then the entire state (rate + capacity) is permuted. Repeat until input is consumed.
Squeeze
output ← []
While more output needed:
output += state[0..r]
state = f(state)
The rate portion is read off. To get more output, permute again and read again. Capacity is never directly emitted. That's the key.
Why length-extension fails
Length-extension needs the full internal state at the end of absorbing, so the attacker can keep absorbing. The sponge's output gives them only the rate, never the capacity. Without the full state they can't continue the computation — the attack stops cold.
This is also why the sponge's security depends on c, not the output length: capacity bits are what's hidden from the attacker.
For SHA3-256:
r = 1088 bitsc = 512 bits- Total state: 1600 bits (Keccak's
Keccak-f[1600])
The 512 capacity bits give 256-bit collision resistance and 256-bit preimage resistance. Compute c/2 for both bounds.
SHA-3 family — same permutation, different (r, c)
| Function | Rate r |
Capacity c |
Output |
|---|---|---|---|
| SHA3-224 | 1152 | 448 | 224 |
| SHA3-256 | 1088 | 512 | 256 |
| SHA3-384 | 832 | 768 | 384 |
| SHA3-512 | 576 | 1024 | 512 |
| SHAKE128 | 1344 | 256 | any |
| SHAKE256 | 1088 | 512 | any |
All built on Keccak-f[1600], the same 1600-bit permutation with 24 rounds. The only difference is how r and c partition that state and how output is generated.
XOFs — extendable-output functions
The squeeze phase is what makes the sponge naturally an XOF: you can keep squeezing as long as you want.
SHAKE128 and SHAKE256 expose this directly: same input, request any output length. Useful for:
- KDFs — derive a 64-byte key from a 16-byte secret with one call.
- Stream ciphers — XOR the SHAKE output against plaintext (don't actually do this without a careful proof, but the primitive is there).
- DRBGs —
cSHAKE-based deterministic random bit generators. - Hash-based signatures (SPHINCS+, XMSS) lean heavily on XOFs.
import hashlib
sk = hashlib.shake_256()
sk.update(b"my secret")
short = sk.hexdigest(32) # 256 bits
long = sk.hexdigest(128) # 1024 bits — same input, longer output
Why both SHA-2 and SHA-3 stick around
- SHA-2 is faster on most hardware. Modern x86 has SHA-NI (since Goldmont/Cannon Lake); SHA-NI accelerates SHA-256 specifically. SHA-3 has no equivalent on most chips.
- SHA-2 is ubiquitously deployed. Every TLS suite, every certificate, every Git object since 2024. SHA-3 adoption is real but mostly in newer protocols (Ethereum's keccak256 — note: that's the original Keccak with different padding, not standardised SHA3-256 — TLS 1.3 transcripts are SHA-2-based).
- SHA-3 is structural insurance. If a major weakness is ever found in SHA-2, the world has SHA-3 ready to go. They share no internal structure, so a break of one is unlikely to threaten the other.
For new designs, modern advice is roughly:
- Need a hash, want speed, can't switch later → SHA-256 or SHA-512.
- Need a hash, want speed, software-only → BLAKE2 / BLAKE3.
- Need any-length output → SHAKE128 / SHAKE256.
- Need to be Merkle-Damgård-immune for some reason → SHA-3 or BLAKE2/3.
A worked sponge
Imagine a tiny sponge with r = 4, c = 4 (8-bit state, ridiculous in practice but useful for visualisation):
state: 0000 | 0000 ← initial
input: abcd
Absorb 4 bits 'a':
state: a000 | 0000 → permute → q4ws | nz1m
Absorb next 4 bits 'b':
state: q^b ws | nz1m → permute → e3rt | yu7k
... and so on. Then squeeze the rate.
The point: input only ever XORs into the rate; capacity is mixed by the permutation but never directly written or read. By the time you read output, you can't reconstruct the capacity from observation.
Summary
| Merkle-Damgård (SHA-2) | Sponge (SHA-3) | |
|---|---|---|
| Building block | Compression function | Permutation |
| Output | Final state | Squeeze of rate |
| Length-extension | Yes (use HMAC) | No |
| XOF natively | No | Yes (SHAKE) |
| Hardware support | SHA-NI on x86 | Limited |
| Speed in pure software | Moderate | Slower than SHA-2 / BLAKE |
If you're building something new and want length-extension immunity, native variable-length output, or "structurally different from SHA-2 for insurance reasons" — reach for SHA-3 or its faster cousin BLAKE3 (which uses a different but also length-extension-immune construction). For everything else, SHA-256 still works fine.
Tools in the wild
4 tools- cliopenssl dgst -sha3-256free tier
OpenSSL ≥ 1.1.1 supports the full SHA-3 family. SHA3-256/384/512 + SHAKE128/256.
- libraryPython hashlibfree tier
Standard-library SHA-3 since Python 3.6. Includes SHAKE for variable output.
- library@noble/hashesfree tier
Audited pure-JS SHA-3 (and Keccak / KMAC / cSHAKE). Tiny, no native deps.
- specKangarooTwelvefree tier
Keccak-team's parallel sponge — 12 rounds instead of 24, parallelisable. Successor to SHA-3 in the same family.