hashing · level 6

BLAKE2 & BLAKE3

Why modern hashes are faster than SHA-256 — and just as safe.

200 XP

BLAKE2 & BLAKE3

SHA-256 is fine. It's been around for two decades, it has hardware acceleration on every modern CPU, and it'll still be standing when half of today's protocols are obsolete. But "fine" doesn't mean "fastest". The BLAKE family — BLAKE2 (2012) and BLAKE3 (2020) — gives you SHA-256-equivalent security at 2–10× the throughput, in pure software, with no hardware support needed.

These are the hashes the modern cryptography ecosystem actually picks when nobody's forcing them to use SHA-256.

Where BLAKE came from

BLAKE was one of the five SHA-3 finalists. It didn't win — Keccak did, on the basis of being structurally different from SHA-2 (sponge vs Merkle-Damgård). But the cryptanalysis community concluded that BLAKE was also secure, and much faster than the eventual SHA-3 in software.

So Aumasson, Neves, Wilcox-O'Hearn, and Winnerlein took the BLAKE design, simplified it, and shipped:

BLAKE2b (2012) — 64-bit, up to 512-bit digests. Drop-in for SHA-512.
BLAKE2s — 32-bit, up to 256-bit digests. Drop-in for SHA-256.

In 2020 the BLAKE team released BLAKE3, structurally different again — a parallel Merkle-tree construction that scales linearly across cores.

The BLAKE2 design choices

BLAKE2 is a refinement of BLAKE for the real world:

Removed the salt + personalisation parameters from the spec into the API (so the core doesn't pay for unused features).
Halved the round count (12 instead of 16 for blake2b; both still secure).
Built-in keyed-hash mode (no separate HMAC needed).
Tunable output length (any value 1..64 bytes for blake2b, 1..32 for blake2s).
Tree-mode hashing baked into the spec.

The result: BLAKE2b is ~3× faster than SHA-256 on modern x86 (no AES or SHA hardware), about as fast as SHA-256-NI when SHA-NI is available, and dramatically faster than SHA-3.

$ openssl speed -seconds 1 sha256 blake2b512
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha256          184934.93k   428521.22k   864213.50k  1131258.39k  1230966.95k
blake2b512      241854.83k   724826.76k  1413617.66k  1732617.15k  1851490.23k

BLAKE2b vs BLAKE2s — pick by platform width

The only reason to choose between them is hardware platform:

	BLAKE2b	BLAKE2s
Word size	64-bit	32-bit
Block size	128 bytes	64 bytes
Max output	64 bytes (512 bits)	32 bytes (256 bits)
Internal state	8 × 64-bit words	8 × 32-bit words
Best for	x86, ARMv8, modern phones	Cortex-M, IoT, browser asm.js

On 64-bit machines, blake2b is faster. On 32-bit / embedded, blake2s is faster. Both are equally secure.

WireGuard uses blake2s exclusively — partly for code-size reasons, partly because the rest of the protocol works with 32-byte digests (256-bit ed25519 keys, 256-bit shared secrets). Argon2 (the password-hashing competition winner) uses blake2b internally as its compression primitive. Signal's X3DH and Sesame protocols use blake2s. IPFS offers blake2b alongside SHA-256 in its multihash format.

BLAKE3 — the throughput killer

BLAKE3 is a different beast. The internal compression function is the same as BLAKE2 (the well-vetted core, slightly tuned), but the structure changed. Instead of Merkle-Damgård (one big chain, sequential), BLAKE3 uses a Merkle tree:

                              root
                             /    \
                           /        \
                         /            \
                  internal             internal
                  /    \                /    \
              leaf      leaf       leaf      leaf
              | 1KB |  | 1KB |    | 1KB |  | 1KB |

The input is chunked into 1 KB pieces; each chunk is hashed independently; the chunk hashes are paired and hashed together up the tree. Every leaf can be hashed in parallel. On a 16-core machine, BLAKE3 hashes a 1 GB file at the maximum disk-to-memory bandwidth — somewhere north of 5 GB/s on consumer hardware.

It's also a single specification — not blake3b vs blake3s, just blake3. Tunable output length via XOF (extendable-output function), so the same algorithm gives you 32-byte digests, 64-byte digests, or any-length keystreams (handy for use as a KDF).

BLAKE3 wins anywhere you hash big things on multi-core: file integrity (b3sum is several times faster than sha256sum), content-addressed storage, large-file Merkle trees, deduplication.

Why both stick around

BLAKE2 is wider deployed. It's in OpenSSL, libsodium, Python's stdlib, PHP, Rust's blake2, Java's bouncy-castle. Many protocols (WireGuard, Argon2, Signal, IPFS) bake BLAKE2 in by name.

BLAKE3 is newer. Adoption is growing fast — it's the default for Tailscale's content-addressed update format, the IPLD content addressing in Filecoin, b3sum in many distros — but it's not yet universal.

A pragmatic rule:

Building something new, single-machine, big files → BLAKE3.
Implementing a protocol that names BLAKE2 → BLAKE2.
Drop-in SHA-256 replacement, want speed → BLAKE2b on 64-bit, BLAKE2s on 32-bit.
Need stdlib support, no third-party deps → SHA-256 still wins on availability.

The takeaway

The crypto community has had two decades of cryptanalytic work on BLAKE. There's no known weakness in the family. Both BLAKE2 and BLAKE3 are explicitly conservative — same security goals as SHA-256/512, just engineered for the realities of modern CPUs. If you're hashing in software and care about throughput, you're probably using one of them.

The fact that you can run b3sum * over a directory and have it complete before sha256sum finishes the first file is a real productivity win — and increasingly the kind of small efficiency gain that builds up across a system.

Tools in the wild

4 tools

b3sumfree tier
BLAKE3 reference CLI. Hash gigabytes per second on multi-core. Drop-in `sha256sum` replacement.
cli
@noble/hashesfree tier
Audited pure-JS implementations of BLAKE2/3, SHA-2/3, KMAC, and more. Tiny, no native deps.
library
python-blake3free tier
Rust-backed BLAKE3 for Python — full speed, including the parallel API.
library
openssl dgst -blake2b512free tier
OpenSSL ≥ 1.1.0 has BLAKE2 built in. `openssl dgst -blake2b512 file` works on any modern system.
cli