BLAKE2 & BLAKE3
Why modern hashes are faster than SHA-256 — and just as safe.
BLAKE2 & BLAKE3
SHA-256 is fine. It's been around for two decades, it has hardware acceleration on every modern CPU, and it'll still be standing when half of today's protocols are obsolete. But "fine" doesn't mean "fastest". The BLAKE family — BLAKE2 (2012) and BLAKE3 (2020) — gives you SHA-256-equivalent security at 2–10× the throughput, in pure software, with no hardware support needed.
These are the hashes the modern cryptography ecosystem actually picks when nobody's forcing them to use SHA-256.
Where BLAKE came from
BLAKE was one of the five SHA-3 finalists. It didn't win — Keccak did, on the basis of being structurally different from SHA-2 (sponge vs Merkle-Damgård). But the cryptanalysis community concluded that BLAKE was also secure, and much faster than the eventual SHA-3 in software.
So Aumasson, Neves, Wilcox-O'Hearn, and Winnerlein took the BLAKE design, simplified it, and shipped:
- BLAKE2b (2012) — 64-bit, up to 512-bit digests. Drop-in for SHA-512.
- BLAKE2s — 32-bit, up to 256-bit digests. Drop-in for SHA-256.
In 2020 the BLAKE team released BLAKE3, structurally different again — a parallel Merkle-tree construction that scales linearly across cores.
The BLAKE2 design choices
BLAKE2 is a refinement of BLAKE for the real world:
- Removed the salt + personalisation parameters from the spec into the API (so the core doesn't pay for unused features).
- Halved the round count (12 instead of 16 for blake2b; both still secure).
- Built-in keyed-hash mode (no separate HMAC needed).
- Tunable output length (any value 1..64 bytes for blake2b, 1..32 for blake2s).
- Tree-mode hashing baked into the spec.
The result: BLAKE2b is ~3× faster than SHA-256 on modern x86 (no AES or SHA hardware), about as fast as SHA-256-NI when SHA-NI is available, and dramatically faster than SHA-3.
$ openssl speed -seconds 1 sha256 blake2b512
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
sha256 184934.93k 428521.22k 864213.50k 1131258.39k 1230966.95k
blake2b512 241854.83k 724826.76k 1413617.66k 1732617.15k 1851490.23k
BLAKE2b vs BLAKE2s — pick by platform width
The only reason to choose between them is hardware platform:
| BLAKE2b | BLAKE2s | |
|---|---|---|
| Word size | 64-bit | 32-bit |
| Block size | 128 bytes | 64 bytes |
| Max output | 64 bytes (512 bits) | 32 bytes (256 bits) |
| Internal state | 8 × 64-bit words | 8 × 32-bit words |
| Best for | x86, ARMv8, modern phones | Cortex-M, IoT, browser asm.js |
On 64-bit machines, blake2b is faster. On 32-bit / embedded, blake2s is faster. Both are equally secure.
WireGuard uses blake2s exclusively — partly for code-size reasons, partly because the rest of the protocol works with 32-byte digests (256-bit ed25519 keys, 256-bit shared secrets). Argon2 (the password-hashing competition winner) uses blake2b internally as its compression primitive. Signal's X3DH and Sesame protocols use blake2s. IPFS offers blake2b alongside SHA-256 in its multihash format.
BLAKE3 — the throughput killer
BLAKE3 is a different beast. The internal compression function is the same as BLAKE2 (the well-vetted core, slightly tuned), but the structure changed. Instead of Merkle-Damgård (one big chain, sequential), BLAKE3 uses a Merkle tree:
root
/ \
/ \
/ \
internal internal
/ \ / \
leaf leaf leaf leaf
| 1KB | | 1KB | | 1KB | | 1KB |
The input is chunked into 1 KB pieces; each chunk is hashed independently; the chunk hashes are paired and hashed together up the tree. Every leaf can be hashed in parallel. On a 16-core machine, BLAKE3 hashes a 1 GB file at the maximum disk-to-memory bandwidth — somewhere north of 5 GB/s on consumer hardware.
It's also a single specification — not blake3b vs blake3s, just blake3. Tunable output length via XOF (extendable-output function), so the same algorithm gives you 32-byte digests, 64-byte digests, or any-length keystreams (handy for use as a KDF).
BLAKE3 wins anywhere you hash big things on multi-core: file integrity (b3sum is several times faster than sha256sum), content-addressed storage, large-file Merkle trees, deduplication.
Why both stick around
BLAKE2 is wider deployed. It's in OpenSSL, libsodium, Python's stdlib, PHP, Rust's blake2, Java's bouncy-castle. Many protocols (WireGuard, Argon2, Signal, IPFS) bake BLAKE2 in by name.
BLAKE3 is newer. Adoption is growing fast — it's the default for Tailscale's content-addressed update format, the IPLD content addressing in Filecoin, b3sum in many distros — but it's not yet universal.
A pragmatic rule:
- Building something new, single-machine, big files → BLAKE3.
- Implementing a protocol that names BLAKE2 → BLAKE2.
- Drop-in SHA-256 replacement, want speed → BLAKE2b on 64-bit, BLAKE2s on 32-bit.
- Need stdlib support, no third-party deps → SHA-256 still wins on availability.
The takeaway
The crypto community has had two decades of cryptanalytic work on BLAKE. There's no known weakness in the family. Both BLAKE2 and BLAKE3 are explicitly conservative — same security goals as SHA-256/512, just engineered for the realities of modern CPUs. If you're hashing in software and care about throughput, you're probably using one of them.
The fact that you can run b3sum * over a directory and have it complete before sha256sum finishes the first file is a real productivity win — and increasingly the kind of small efficiency gain that builds up across a system.
Tools in the wild
4 tools- clib3sumfree tier
BLAKE3 reference CLI. Hash gigabytes per second on multi-core. Drop-in `sha256sum` replacement.
- library@noble/hashesfree tier
Audited pure-JS implementations of BLAKE2/3, SHA-2/3, KMAC, and more. Tiny, no native deps.
- librarypython-blake3free tier
Rust-backed BLAKE3 for Python — full speed, including the parallel API.
- cliopenssl dgst -blake2b512free tier
OpenSSL ≥ 1.1.0 has BLAKE2 built in. `openssl dgst -blake2b512 file` works on any modern system.