hashing · level 8

Sponge Construction (SHA-3)

Absorb, squeeze, and why SHA-3 doesn't have SHA-2's length-extension flaw.

220 XP

Sponge Construction (SHA-3)

SHA-2 (the SHA-256/384/512 family) is built on the Merkle-Damgård construction: a fixed compression function chained over message blocks. It works, it's been the workhorse of the internet for two decades, and it has one famous wart: length-extension. SHA-3, ratified in 2015, deliberately chose a different shape — the sponge — partly to insure against unknown attacks on Merkle-Damgård, partly because the sponge fixes length-extension by design.

The sponge mental model is the cleanest in cryptography. Once you see "absorb, then squeeze," you stop being puzzled by why SHA-3 looks structurally different from everything before it.

What's wrong with Merkle-Damgård

Merkle-Damgård chains a compression function over fixed-size blocks:

IV → [c] → H(M_1) → [c] → H(M_1, M_2) → [c] → H(M_1, M_2, M_3)  =  H(M)
        ↑              ↑              ↑
        M_1            M_2            M_3

The problem: the final intermediate state IS the output digest. Anyone with the digest of M can plug it back in as the IV, append M', and produce H(M || padding || M') without ever seeing M. This is the length-extension attack, and it broke many H(secret || message) MAC constructions in the wild — which is why HMAC was invented.

SHA-256 has length-extension. SHA-512 has it. SHA-1 had it. MD5 had it. Anything Merkle-Damgård-shaped does.

The sponge

Imagine a state of r + c bits, divided into:

  • Rate (r bits) — the part that interacts with input/output.
  • Capacity (c bits) — the part that stays internal.
state = [    rate r bits      |    capacity c bits    ]
         ↑                    ↑
         touched by I/O       never directly emitted

The hash function applies a permutation f to the whole state. The permutation is fast, public, and a one-way mixing operation (the bigger and randomer the permutation, the more secure the sponge).

Two phases:

Absorb

For each input block M_i (size r):
    state[0..r] ⊕= M_i
    state = f(state)

Each block is XORed into the rate. Then the entire state (rate + capacity) is permuted. Repeat until input is consumed.

Squeeze

output ← []
While more output needed:
    output += state[0..r]
    state = f(state)

The rate portion is read off. To get more output, permute again and read again. Capacity is never directly emitted. That's the key.

Why length-extension fails

Length-extension needs the full internal state at the end of absorbing, so the attacker can keep absorbing. The sponge's output gives them only the rate, never the capacity. Without the full state they can't continue the computation — the attack stops cold.

This is also why the sponge's security depends on c, not the output length: capacity bits are what's hidden from the attacker.

For SHA3-256:

  • r = 1088 bits
  • c = 512 bits
  • Total state: 1600 bits (Keccak's Keccak-f[1600])

The 512 capacity bits give 256-bit collision resistance and 256-bit preimage resistance. Compute c/2 for both bounds.

SHA-3 family — same permutation, different (r, c)

Function Rate r Capacity c Output
SHA3-224 1152 448 224
SHA3-256 1088 512 256
SHA3-384 832 768 384
SHA3-512 576 1024 512
SHAKE128 1344 256 any
SHAKE256 1088 512 any

All built on Keccak-f[1600], the same 1600-bit permutation with 24 rounds. The only difference is how r and c partition that state and how output is generated.

XOFs — extendable-output functions

The squeeze phase is what makes the sponge naturally an XOF: you can keep squeezing as long as you want.

SHAKE128 and SHAKE256 expose this directly: same input, request any output length. Useful for:

  • KDFs — derive a 64-byte key from a 16-byte secret with one call.
  • Stream ciphers — XOR the SHAKE output against plaintext (don't actually do this without a careful proof, but the primitive is there).
  • DRBGscSHAKE-based deterministic random bit generators.
  • Hash-based signatures (SPHINCS+, XMSS) lean heavily on XOFs.
import hashlib
sk = hashlib.shake_256()
sk.update(b"my secret")
short = sk.hexdigest(32)   # 256 bits
long  = sk.hexdigest(128)  # 1024 bits — same input, longer output

Why both SHA-2 and SHA-3 stick around

  • SHA-2 is faster on most hardware. Modern x86 has SHA-NI (since Goldmont/Cannon Lake); SHA-NI accelerates SHA-256 specifically. SHA-3 has no equivalent on most chips.
  • SHA-2 is ubiquitously deployed. Every TLS suite, every certificate, every Git object since 2024. SHA-3 adoption is real but mostly in newer protocols (Ethereum's keccak256 — note: that's the original Keccak with different padding, not standardised SHA3-256 — TLS 1.3 transcripts are SHA-2-based).
  • SHA-3 is structural insurance. If a major weakness is ever found in SHA-2, the world has SHA-3 ready to go. They share no internal structure, so a break of one is unlikely to threaten the other.

For new designs, modern advice is roughly:

  • Need a hash, want speed, can't switch later → SHA-256 or SHA-512.
  • Need a hash, want speed, software-only → BLAKE2 / BLAKE3.
  • Need any-length output → SHAKE128 / SHAKE256.
  • Need to be Merkle-Damgård-immune for some reason → SHA-3 or BLAKE2/3.

A worked sponge

Imagine a tiny sponge with r = 4, c = 4 (8-bit state, ridiculous in practice but useful for visualisation):

state:  0000 | 0000      ← initial
input:  abcd
                        Absorb 4 bits 'a':
state:  a000 | 0000  →  permute  →  q4ws | nz1m

                        Absorb next 4 bits 'b':
state:  q^b ws | nz1m  →  permute  →  e3rt | yu7k

                        ... and so on. Then squeeze the rate.

The point: input only ever XORs into the rate; capacity is mixed by the permutation but never directly written or read. By the time you read output, you can't reconstruct the capacity from observation.

Summary

Merkle-Damgård (SHA-2) Sponge (SHA-3)
Building block Compression function Permutation
Output Final state Squeeze of rate
Length-extension Yes (use HMAC) No
XOF natively No Yes (SHAKE)
Hardware support SHA-NI on x86 Limited
Speed in pure software Moderate Slower than SHA-2 / BLAKE

If you're building something new and want length-extension immunity, native variable-length output, or "structurally different from SHA-2 for insurance reasons" — reach for SHA-3 or its faster cousin BLAKE3 (which uses a different but also length-extension-immune construction). For everything else, SHA-256 still works fine.

Tools in the wild

4 tools
  • OpenSSL ≥ 1.1.1 supports the full SHA-3 family. SHA3-256/384/512 + SHAKE128/256.

    cli
  • Standard-library SHA-3 since Python 3.6. Includes SHAKE for variable output.

    library
  • @noble/hashesfree tier

    Audited pure-JS SHA-3 (and Keccak / KMAC / cSHAKE). Tiny, no native deps.

    library
  • Keccak-team's parallel sponge — 12 rounds instead of 24, parallelisable. Successor to SHA-3 in the same family.

    spec