← Bank
System Design

Design a URL Shortener

System DesignMid~35m
system-designencodingdatabases

Prompt

A growth team at a marketing company wants to stop pasting raw campaign links into emails and billboards. They've asked you to build them a link service: "you give it a URL, it gives you a short link, and the short link sends people to the right place." That's the whole brief they handed you. Build it.

How this round runs

I've handed you a vague one-liner on purpose — there's a lot it doesn't say. You drive: pin down what this thing actually has to do and at what scale before you draw a single box, then design it, and I'll push you to go deep on one piece and to commit to a trade-off and own its cost.

Model answer

1. Requirements I'd surface first. The brief hides almost everything that shapes the design, so I'd ask:

  • Read:write ratio? For a campaign-link product this is wildly read-heavy — say 10000:1. That single number tells me the redirect path is the hot path and writes are rare; the whole design bends toward fast, cacheable reads.
  • Vanity / custom aliases? "Give it a URL, get a short link" is silent on whether the growth team wants go.co/spring-sale. If yes, the key space is no longer something I control — I need a uniqueness check on a user-supplied key.
  • Analytics? Marketing almost certainly wants click counts. That's a separate, write-heavy concern I'd keep off the redirect hot path.
  • Custom expiry? Campaign links may need to die after the campaign. That adds a TTL/expiry field and a sweep.
  • Scale + retention: how many links/day, kept for how long? Suppose 100M links/month, kept 5 years → ~6B rows. That sizes the key length and the store.

Non-functional: redirect latency p99 under ~50ms, redirects should survive a cache or DB hiccup, and a short code must never collide onto two different URLs.

2. High-level design (and why each piece).

client --> CDN/edge --> stateless API servers --> cache (Redis) --> durable KV store
                              |                                          ^
                         key generator -------- write path -------------/
                              |
                        analytics queue --> async click counter (off hot path)
  • Stateless API tier so I can scale redirects horizontally behind a load balancer — no per-node state to lose.
  • Cache in front of the store because reads dominate 10000:1; the working set of "hot" links is tiny and lives happily in memory.
  • Durable KV store (the code→URL map is a pure key lookup; I don't need joins), indexed on the short code.
  • Key generator as its own concern — see the deep-dive.
  • Analytics as an async queue, deliberately decoupled so a click never blocks or fails a redirect.

3. Deep-dive: key generation + storage. This is the hard part, so I'd go deep here. Two honest options:

  • Hash the long URL (e.g. base62 of a truncated hash). Idempotent — the same URL maps to the same code, which dedups for free. Cost: collisions are real once the table is large, so every write needs a "does this code already point at a different URL?" check and a rehash-on-collision loop.
  • Counter + base62 encode. A monotonic ID source (or a pre-allocated range per node so nodes don't fight over a global counter) encoded to base62. No collisions by construction. Cost: codes are sequential and guessable, which leaks volume and lets people enumerate links — a privacy problem for a marketing product.

For 6B links, base62 needs ceil(log62(6e9)) ≈ 6 characters (62^6 ≈ 56B), so a 7-char code gives comfortable headroom. Storage: ~6B rows × ~500 bytes ≈ 3TB before replication — beyond one node, so the store has to shard, by the short code itself (it's the natural partition key and the only lookup key on the hot path).

4. A committed trade-off and its cost. I'd commit to the counter + per-node range, with the code generated by hashing the counter through a keyed permutation so codes are non-sequential without collision-checking. The cost I'm accepting, and I'll say it out loud: I give up the free deduplication that pure URL-hashing gave me — the same long URL submitted twice produces two different short codes and two rows. For a campaign-link tool that's fine (teams often want distinct links per channel anyway), and I'd rather pay duplicate rows than pay a collision-retry loop on every write. If dedup later becomes a hard requirement, I'd add a secondary url_hash → code index and look it up before minting.

5. Operational concerns. Failure mode: the cache goes cold (restart or eviction storm) and every redirect falls through to the store, which buckles under 10000× its normal load. I'd detect it on a cache-hit-ratio alert and a store-QPS spike, and defend against it with request coalescing (collapse a thundering herd of misses for the same code into one store read) plus serving slightly-stale cached entries during a store brown-out. Rollback for a bad deploy: the API tier is stateless, so I roll back the image; the data is append-mostly so there's nothing to migrate back.

Signals — what a strong answer shows
  • Surfaced read:write ratio, vanity URLs, analytics, and expiry as requirements before drawing anything
  • Justified each component (cache because reads dominate, async queue to keep clicks off the redirect path)
  • Went deep on key generation, comparing hash vs counter with the actual cost of each
  • Committed to a specific approach and named what it gives up (free dedup)
  • Raised a cold-cache failure mode with detection and a defense, unprompted
Follow-ups — where it goes next
  • 'What breaks at 10x links/sec on writes?' → the global counter contends; pre-allocate ID ranges per node so writes don't coordinate
  • 'Why a KV store and not Postgres?' → it's a pure key lookup with no joins; commit to KV, but note Postgres is fine until you outgrow one node
  • 'A whole region's cache dies — now what?' → coalesce misses, serve stale, and rate-limit the store so a cold cache degrades instead of cascading
  • 'Custom vanity aliases?' → user-supplied key means a uniqueness check on write and a reserved-words list