hashing · level 1

What Is a Hash?

One-way, deterministic, avalanche.

150 XP

What Is a Hash?

A cryptographic hash function takes arbitrary-length input and returns a fixed-length digest. It's not encryption — there's no key, no reversal. It's a one-way fingerprint.

SHA-256, for example, always returns 256 bits (64 hex characters), whether you feed it an empty string or a terabyte of video.

Analogy

Imagine tossing every ingredient of a recipe — chicken, broth, noodles, spices — into an industrial blender and pressing "liquefy". You get a uniform puree that's impossible to reverse: no one can look at the puree and tell you the original chicken came from which farm. But the puree is perfectly repeatable. Run the exact same recipe twice and you get two identical purees; swap one pinch of salt for pepper and the puree changes colour entirely. That's a cryptographic hash: deterministic, irreversible, and ruthlessly sensitive to even the smallest change in input.

The three defining properties

Property What it means
Deterministic Same input → same output, always. No randomness.
One-way (preimage resistance) Given digest d, you can't feasibly find any x with H(x) = d.
Avalanche A one-bit change in the input flips ~50% of the output bits.

Determinism

echo -n "hello" | shasum -a 256
# 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

echo -n "hello" | shasum -a 256
# 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824   ← identical

This is how you can store a hash of a file and use it to check the file later: if the file hasn't changed, the hash hasn't changed either.

One-way

There's no "unhash" button. You can't start from a digest and recover the input. The only way to discover the input is to guess it — which is exactly what password crackers do.

That's why passwords are stored hashed: even if the database leaks, attackers must do enormous work to recover the originals.

Avalanche

echo -n "The quick brown fox" | shasum -a 256
# b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9

echo -n "The quick brown foX" | shasum -a 256
# 94b00a5e84d6a79a30f8b2df10f55b3ed4c21a4d2bd0e4a9f8dc2d84cb91f03f   ← totally different

A single-letter change to the input scrambles the output completely. This is the avalanche effect, and it's why you can't "peek" inside a hash to reason about its input.

Hashing vs encryption vs encoding

  • Encoding (Base64, hex): reversible, no key. Not about secrecy.
  • Encryption (AES, RSA): reversible with the key. Hides content.
  • Hashing (SHA-256): not reversible, no key. Summarizes content.

Where hashes show up

  • Git — every commit is named by the SHA-1 hash of its contents (Git is migrating to SHA-256).
  • Passwords — servers store hash(password + salt), never the password itself.
  • Content-addressed storage — Docker, IPFS, npm package integrity: "do two blobs have the same hash" is a fast stand-in for "are they identical?"
  • Digital signatures — you sign the hash of the document, not the document itself, so it's short and fixed-size.

Not every hash is cryptographic

hashCode() on a Java object, hash() in a Python dict — these aim for speed and a good distribution over small inputs. They're not collision-resistant. If you see MD5 or SHA-1 used for security-sensitive work today, that's a bug: both have been broken. Use SHA-256 or SHA-3.