← Bank
System Design

Design a File Storage Service

System DesignMid~40m
system-designstoragechunkingconsistency

Prompt

We want users to be able to upload files into our app and download them later — think a "my files" area. Some of the files are big, like multi-gigabyte video exports, and people complain that uploads fail halfway and they have to start over on bad hotel wifi. Design the upload-and-download service.

How this round runs

The brief says "upload a file, download it later," but "big" and "uploads fail halfway" are the whole game here. You drive: surface size, resumability, and what "download later" really needs, then design it, and I'll push you to go deep on chunking and resumable upload, and to commit to a trade-off.

Model answer

1. Requirements I'd surface first.

  • File size range? "Multi-gigabyte" changes everything: a 100KB avatar and a 5GB video want different upload paths. I'd ask for the max and the distribution, and design the large path as the hard case.
  • Resumable uploads? The complaint is "fails halfway on bad wifi," so resumability is a hard requirement, not a nice-to-have. That forces chunking (see deep-dive).
  • Overwrite semantics? "Download later" — if a user re-uploads the same filename, is it a new version or an in-place overwrite? This is a consistency question: what does a reader see mid-overwrite? I'd surface it explicitly.
  • Dedup? If many users upload the same file (a shared video), do we store it once? Content-addressing buys storage savings.
  • Access control + sharing: private by default? shareable links?
  • Scale: say 10M users, files from KB to multi-GB, mostly write-once then read-occasionally.

Non-functional: a 5GB upload over flaky wifi must survive disconnects and resume; downloads must be fast (ideally edge-served); an overwrite must never let a reader see a half-written file.

2. High-level design.

client --> API server --> metadata store (file id, owner, size, chunk list, version, blob refs)
                |
                +--> blob/object store (the actual bytes, chunked)
                |
          presigned upload/download URLs (client talks to blob store directly for the bytes)

Key split: metadata in a database, bytes in a blob/object store. The metadata row is small and queryable (owner, size, version, the list of chunks and their locations); the multi-GB bytes go to object storage built for large immutable blobs. The API hands the client a presigned URL so the client uploads/downloads bytes directly to/from the blob store — the API never proxies gigabytes through itself (that would make the API tier the bottleneck).

3. Deep-dive: chunking + resumable upload. This is the part the complaint is really about.

Why chunk. A single 5GB PUT that dies at 90% wastes 4.5GB and forces a full restart — exactly the hotel-wifi pain. So I split the file into fixed-size chunks (say 8MB) and upload them independently. Now a dropped connection only loses the in-flight chunk, and chunks can upload in parallel.

The resumable protocol. (1) Client calls initiate upload → server creates a metadata record with the file's total size and chunk count, returns an upload id. (2) Client uploads each chunk (presigned per-chunk), the store acks each one. (3) On disconnect/resume, the client asks the server which chunks are already present, and re-sends only the missing ones — that is what makes it resumable and idempotent (re-uploading a chunk that's already there is a harmless no-op). (4) When all chunks are present, the client calls complete, the server verifies the chunk set (and a per-chunk + whole-file checksum for integrity), and assembles the final object. This is essentially the multipart-upload pattern.

Dedup falls out naturally. If I hash each chunk (content-addressing), an identical chunk already stored anywhere is referenced instead of re-uploaded — saving storage and bandwidth, at the cost of a hash lookup per chunk.

4. Deep-dive: consistency on overwrite. "Download later" plus re-uploads forces a decision. If overwrite mutates the object in place, a reader downloading during the overwrite can get a Frankenstein half-old/half-new file — a correctness bug. So I commit to immutable, versioned objects: an "overwrite" writes a new version (new blob, new chunk set) and then flips the metadata pointer to it in one atomic update. Readers always resolve a complete version — they get the old one until the pointer flips, then the new one, never a mix. The cost is extra storage for old versions (mitigated by lifecycle/GC of stale versions), but I get clean read consistency and free rollback.

5. A committed trade-off and its cost. I commit to chunked, content-addressed, immutable-versioned storage. The cost I name out loud: more metadata and more moving parts — I track every chunk and every version, run a chunk-assembly + verify step, and need garbage collection for orphaned chunks and superseded versions. For a service whose whole pain is "big uploads fail and re-uploads corrupt reads," that bookkeeping is exactly what buys resumability, dedup, and clean overwrites. If files were guaranteed small and write-once, I'd drop chunking and versioning and store blobs whole — but the multi-GB + flaky-wifi requirement is precisely why I'm paying for the complex path.

6. Operational concerns. Failure mode: an upload is abandoned partway (user gives up), leaving orphaned chunks in the blob store that never get a complete. I'd detect it with a TTL on incomplete upload sessions and a background GC that reclaims chunks of sessions that never completed — otherwise storage leaks. Another: the complete/pointer-flip half-fails. Because the flip is a single atomic metadata update, a crash leaves the old version live (safe) and the new chunks as collectable orphans — never a corrupt read. Downloads I'd serve via the blob store / CDN edge so origin load stays flat. Rollback: versioning means reverting a bad overwrite is just re-pointing metadata at the previous version — no data movement.

Signals — what a strong answer shows
  • Surfaced size range, resumability, and overwrite semantics as requirements before designing
  • Split metadata from bytes and used presigned URLs so the API never proxies gigabytes
  • Designed chunking with a query-missing-chunks resume protocol and per-chunk checksums
  • Made overwrite an atomic pointer flip over immutable versions so readers never see a half-written file
  • Committed to the chunked/versioned path and named its cost (metadata bookkeeping + GC), plus orphan-chunk cleanup unprompted
Follow-ups — where it goes next
  • 'The upload dies at 90% — does the user restart from zero?' → no; chunked upload, on resume query which chunks exist and send only the missing ones
  • 'A user re-uploads while someone else is downloading — what does the reader see?' → immutable versions + atomic pointer flip, so the reader gets a complete old or new version, never a mix
  • 'How do you avoid storing the same file twice?' → content-address chunks by hash and reference existing ones, at the cost of a hash lookup per chunk
  • 'Should the API proxy the bytes?' → no, presigned URLs let the client talk to the blob store directly so the API isn't the bottleneck
  • 'What happens to chunks of an abandoned upload?' → TTL on the session + GC reclaims orphaned chunks