system-design · level 2

Caching Patterns

Cache-aside, write-through, write-back; LRU vs LFU eviction.

150 XP

Caching Patterns

Caches trade memory for latency. A request that took 50ms hitting Postgres takes 0.5ms hitting Redis. Multiplied across millions of requests, that's the difference between a humming service and one that needs an extra fleet just to serve reads.

But caches are also the hardest thing in distributed systems to get right — the famous "two hard things" of cache invalidation. Get the strategy right and the cache becomes pure upside. Get it wrong and you ship stale data, lose writes, or thunder-herd the database every time the cache restarts.

The two axes

Every cache decision is really two decisions:

Where does the write go? (cache-aside, write-through, write-back)
What gets evicted when memory fills? (LRU, LFU, FIFO, TTL)

Pick the right pair for your read/write ratio and your tolerance for stale or lost data.

Where the write goes

Cache-aside (lazy loading)

The default. The application reads from the cache, falls through to the database on miss, populates the cache, returns the value. The cache is invisible to writes.

read:   app → cache → (miss) → db → app populates cache → return
write:  app → db → app deletes cache key (so next read repopulates)

Pros: simple, the cache is genuinely optional — kill Redis and the app still works (just slower). The DB remains source of truth.

Cons: every cold read pays full DB cost. The first request after a deployment, or after Redis restart, is slow. Stampede risk — if a hot key expires, 1000 concurrent readers all hit the DB.

When to use it: ~90% of services. Read-heavy workloads where slightly stale data is fine.

Write-through

Every write goes to both the cache and the database synchronously. The write returns only after both succeed.

write:  app → cache.set + db.write (in parallel or sequence) → return
read:   always served from cache (which is always up-to-date)

Pros: cache and DB never disagree. Reads never miss for data that exists.

Cons: writes pay double latency. If either side fails, partial-success problems (especially with Redis fire-and-forget semantics — be careful).

When to use it: read-after-write critical paths where staleness is unacceptable. Sessions, feature flags, billing balances during a transaction.

Write-back (write-behind)

Writes go to the cache only. The cache asynchronously flushes to the database in batches.

write:  app → cache (returns immediately)
            ↓ (every 50ms or 100 keys)
            db.batchWrite

Pros: very fast writes, batches reduce DB load. Great for high-volume telemetry, view counts, analytics.

Cons: if the cache dies before flush, those writes are gone. The DB is out-of-date by the flush interval. Crash recovery is ugly.

When to use it: high-write-volume non-critical paths. View counts, like counts, last-active timestamps, audit trails where minute-grain accuracy is fine.

What gets evicted

When the cache fills, something has to go. The eviction policy decides what.

LRU — Least Recently Used

Evict whatever hasn't been touched for the longest time. Implemented as a hash map + doubly-linked list. O(1) for get/put, O(1) for eviction.

Strengths: simple, hardware-cache-line proven, works well when "popular things stay popular for a while".

Weaknesses: a one-time scan of N items can flood the cache and evict your hot set ("scan resistance" failure). Once-popular items linger forever even if they're never going to be touched again.

LFU — Least Frequently Used

Evict whatever has been hit fewest times. Adds a counter per entry.

Strengths: better for skewed access patterns (Zipfian distributions, where 5% of keys take 80% of traffic). Resists scan-flooding.

Weaknesses: stale popular items can linger forever — a key that was hot a year ago still has a high count. Pure LFU is rarely used; modern systems use W-TinyLFU or ARC that decay frequencies over time.

TTL (time to live)

Every entry has an expiry timestamp; once expired, the entry's gone (or refreshed on the next read).

Strengths: predictable, bounds staleness explicitly. The simplest "freshness" guarantee.

Weaknesses: not really an eviction strategy — it's a freshness strategy. You usually combine it with LRU/LFU as the actual size-bounded policy.

Stampede prevention

When a hot key expires and 1000 readers all miss simultaneously, they all hit the DB. If the DB takes 500ms, you've created an instant 1000-concurrent-query spike on a backend that was happy moments ago.

Three defenses:

Single-flight (request coalescing). All concurrent misses for the same key share one DB call. The first miss kicks off the fetch; the rest wait on it.
Probabilistic early refresh. Before the TTL expires, randomly refresh in-band based on remaining time. Spreads refresh load across the TTL window.
Negative caching. Cache "this key doesn't exist" too — otherwise, a 404 path becomes a DB-stampede vector.

Cache invalidation

The "two hard things" quote isn't a joke. The classic patterns:

Cache-aside + delete on write. When you update the DB, delete the cache key. Next read repopulates. Simple, mostly correct, has a tiny race (between update and delete a stale read can repopulate the cache).
Versioned keys. Cache key includes a version (user:42:v3). Updating the row bumps the version. Old keys naturally expire. No stampede on update.
Pub/sub invalidation. The DB publishes a change event; subscribers nuke the cache key. Eventual consistency, but no race.
TTL-only. Don't invalidate at all — accept staleness up to the TTL. Easiest, hardest to debug when stale data ships.

Multi-tier caching

Big systems stack caches:

client → CDN → edge cache → app cache → Redis → DB

Each layer trades capacity for latency. The CDN is fast and huge but only caches static GET responses. Redis is slower than memory but bigger than memory. The DB is biggest but slowest.

The discipline: each layer should have a higher hit rate than the layer below, AND a shorter TTL than the layer above. CDN: 1 hour TTL, 99% hit rate. Redis: 5 min TTL, 95% hit rate. DB: source of truth.

Picking the strategy

A reasonable progression:

Cache-aside + LRU + 5-min TTL. Default for nearly everything.
Add TTL jitter (4–6 min instead of exactly 5) so the entire cache doesn't expire in a wave.
Add single-flight when you see DB spikes correlated with cache evictions.
Move to write-through for paths where staleness is forbidden (sessions, billing).
Move to write-back only for high-volume non-critical writes (counters, telemetry).

Almost no service needs anything fancier. The exotic patterns (ARC, W-TinyLFU, multi-region replication) exist for reasons, but those reasons are usually visible — you'll know.

Common tools

Redis — the default. Persistence, replication, clustering, eviction policies.
Memcached — pure in-memory K/V. Faster than Redis for the basics, no persistence.
Caffeine (Java) — in-process LRU/W-TinyLFU. Eliminates the network hop entirely for hot paths.
CDN edge caches (CloudFront, Fastly, Cloudflare) — globally distributed, the first cache layer for HTTP.

Tools in the wild

4 tools

Redisfree tier
The default in-memory cache. TTL, eviction policies, pub/sub, streams.
service
Memcachedfree tier
Simpler than Redis. Pure key-value, no persistence, blazing fast.
service
Caffeinefree tier
Java in-memory cache. W-TinyLFU eviction beats LRU on skewed traffic.
library
lru-cache (npm)free tier
Node's de-facto in-memory LRU. TTLs, size-bounded, well-tested.
library