arcadegame.cloud

Prompt

We want users to be able to upload files into our app and download them later — think a "my files" area. Some of the files are big, like multi-gigabyte video exports, and people complain that uploads fail halfway and they have to start over on bad hotel wifi. Design the upload-and-download service.

How this round runs

The brief says "upload a file, download it later," but "big" and "uploads fail halfway" are the whole game here. You drive: surface size, resumability, and what "download later" really needs, then design it, and I'll push you to go deep on chunking and resumable upload, and to commit to a trade-off.

Model answer

1. Requirements I'd surface first.

File size range? "Multi-gigabyte" changes everything: a 100KB avatar and a 5GB video want different upload paths. I'd ask for the max and the distribution, and design the large path as the hard case.
Resumable uploads? The complaint is "fails halfway on bad wifi," so resumability is a hard requirement, not a nice-to-have. That forces chunking (see deep-dive).
Overwrite semantics? "Download later" — if a user re-uploads the same filename, is it a new version or an in-place overwrite? This is a consistency question: what does a reader see mid-overwrite? I'd surface it explicitly.
Dedup? If many users upload the same file (a shared video), do we store it once? Content-addressing buys storage savings.
Access control + sharing: private by default? shareable links?
Scale: say 10M users, files from KB to multi-GB, mostly write-once then read-occasionally.

Non-functional: a 5GB upload over flaky wifi must survive disconnects and resume; downloads must be fast (ideally edge-served); an overwrite must never let a reader see a half-written file.

2. High-level design.

client --> API server --> metadata store (file id, owner, size, chunk list, version, blob refs)
                |
                +--> blob/object store (the actual bytes, chunked)
                |
          presigned upload/download URLs (client talks to blob store directly for the bytes)

Key split: metadata in a database, bytes in a blob/object store. The metadata row is small and queryable (owner, size, version, the list of chunks and their locations); the multi-GB bytes go to object storage built for large immutable blobs. The API hands the client a presigned URL so the client uploads/downloads bytes directly to/from the blob store — the API never proxies gigabytes through itself (that would make the API tier the bottleneck).

3. Deep-dive: chunking + resumable upload. This is the part the complaint is really about.

Why chunk. A single 5GB PUT that dies at 90% wastes 4.5GB and forces a full restart — exactly the hotel-wifi pain. So I split the file into fixed-size chunks (say 8MB) and upload them independently. Now a dropped connection only loses the in-flight chunk, and chunks can upload in parallel.

The resumable protocol. (1) Client calls initiate upload → server creates a metadata record with the file's total size and chunk count, returns an upload id. (2) Client uploads each chunk (presigned per-chunk), the store acks each one. (3) On disconnect/resume, the client asks the server which chunks are already present, and re-sends only the missing ones — that is what makes it resumable and idempotent (re-uploading a chunk that's already there is a harmless no-op). (4) When all chunks are present, the client calls complete, the server verifies the chunk set (and a per-chunk + whole-file checksum for integrity), and assembles the final object. This is essentially the multipart-upload pattern.

Dedup falls out naturally. If I hash each chunk (content-addressing), an identical chunk already stored anywhere is referenced instead of re-uploaded — saving storage and bandwidth, at the cost of a hash lookup per chunk.

4. Deep-dive: consistency on overwrite. "Download later" plus re-uploads forces a decision. If overwrite mutates the object in place, a reader downloading during the overwrite can get a Frankenstein half-old/half-new file — a correctness bug. So I commit to immutable, versioned objects: an "overwrite" writes a new version (new blob, new chunk set) and then flips the metadata pointer to it in one atomic update. Readers always resolve a complete version — they get the old one until the pointer flips, then the new one, never a mix. The cost is extra storage for old versions (mitigated by lifecycle/GC of stale versions), but I get clean read consistency and free rollback.

5. A committed trade-off and its cost. I commit to chunked, content-addressed, immutable-versioned storage. The cost I name out loud: more metadata and more moving parts — I track every chunk and every version, run a chunk-assembly + verify step, and need garbage collection for orphaned chunks and superseded versions. For a service whose whole pain is "big uploads fail and re-uploads corrupt reads," that bookkeeping is exactly what buys resumability, dedup, and clean overwrites. If files were guaranteed small and write-once, I'd drop chunking and versioning and store blobs whole — but the multi-GB + flaky-wifi requirement is precisely why I'm paying for the complex path.

6. Operational concerns. Failure mode: an upload is abandoned partway (user gives up), leaving orphaned chunks in the blob store that never get a complete. I'd detect it with a TTL on incomplete upload sessions and a background GC that reclaims chunks of sessions that never completed — otherwise storage leaks. Another: the complete/pointer-flip half-fails. Because the flip is a single atomic metadata update, a crash leaves the old version live (safe) and the new chunks as collectable orphans — never a corrupt read. Downloads I'd serve via the blob store / CDN edge so origin load stays flat. Rollback: versioning means reverting a bad overwrite is just re-pointing metadata at the previous version — no data movement.

Signals — what a strong answer shows

Surfaced size range, resumability, and overwrite semantics as requirements before designing
Split metadata from bytes and used presigned URLs so the API never proxies gigabytes
Designed chunking with a query-missing-chunks resume protocol and per-chunk checksums
Made overwrite an atomic pointer flip over immutable versions so readers never see a half-written file
Committed to the chunked/versioned path and named its cost (metadata bookkeeping + GC), plus orphan-chunk cleanup unprompted

Follow-ups — where it goes next

'The upload dies at 90% — does the user restart from zero?' → no; chunked upload, on resume query which chunks exist and send only the missing ones
'A user re-uploads while someone else is downloading — what does the reader see?' → immutable versions + atomic pointer flip, so the reader gets a complete old or new version, never a mix
'How do you avoid storing the same file twice?' → content-address chunks by hash and reference existing ones, at the cost of a hash lookup per chunk
'Should the API proxy the bytes?' → no, presigned URLs let the client talk to the blob store directly so the API isn't the bottleneck
'What happens to chunks of an abandoned upload?' → TTL on the session + GC reclaims orphaned chunks

Study this: System Design

Design a File Storage Service

Prompt

How this round runs

Model answer