Eventual vs. Strong Consistency
Prompt
You're replicating data across regions and you reach for eventual consistency instead of strong. Walk me through what you're actually trading away, and how you'd decide which data in a system can tolerate it.
How this round runs
This is a conversation that escalates — I'll keep asking "and what happens when…" until I find the edge of what you know. I want the trade tied to a real consequence and a principled way to decide per-data — then I'll push into making the guarantee tunable and the read-your-writes edge, which is where the honest reasoning shows.
Model answer
I'll frame the trade first, then a decision rule, then the consequence that bites you. Eventual consistency means replicas converge on the same value after writes stop, but at any instant a read can return a stale value — a write you made may not be visible on the replica you happen to hit. Strong consistency (linearizability) makes every read reflect all prior writes, but that requires cross-replica coordination on the write path — a quorum has to acknowledge before the write returns. Through the CAP lens, when a partition splits the replicas, strong consistency must refuse writes on the minority side to stay correct (it gives up availability), while an eventually-consistent system keeps accepting reads and writes and reconciles later. So the trade is concrete: latency and availability, bought with the risk of serving stale data.
How I decide per-data: ask "what's the cost of a stale read here?" A like count, a follower list, a feed ranking — staleness is invisible to the user and self-heals, so eventual is the right call and you get the latency/availability win. Anything where stale data lets two clients make conflicting decisions — account balance, inventory you can oversell, a uniqueness/auth check — needs strong consistency on that path, even at the coordination cost. Most real systems are a hybrid: strong on the small correctness-critical core, eventual on the large remainder.
The edges I'd push myself on. Consistency isn't binary — it's tunable: with quorums you set
R + W > N to get read-your-writes-style guarantees per operation, paying more latency only
where you need it. And the specific anomaly users notice first is failing read-your-writes
— you post a comment, the read hits a lagging replica, and your own comment is missing, which
reads as a bug even though the system is "working as designed." Mitigations are session
stickiness, read-from-primary-after-write, or version/causal tokens. My honest boundary: I can
reason about the quorum knobs and session guarantees conceptually, but the exact convergence
mechanics (vector clocks vs. CRDTs vs. last-write-wins, and their conflict-resolution
trade-offs) are where I'd go deeper before committing a design rather than assert from memory.
- Framed the trade through CAP: strong gives up availability under partition; eventual keeps serving and reconciles
- Gave a principled per-data rule — cost of a stale read — instead of 'use eventual for scale'
- Tied it to consequences: invisible/self-healing staleness vs. conflicting decisions (oversell, double-spend)
- Reached for tunability unprompted — R + W > N quorums to buy consistency per operation
- Named read-your-writes as the anomaly users notice and proposed concrete mitigations; honest about convergence-mechanism depth
- How do you make the consistency tunable rather than a global setting? → quorum reads/writes, R + W > N for read-your-writes per operation
- And what happens when a user writes then immediately reads and hits a lagging replica? → read-your-writes violation; fix with session stickiness, read-from-primary-after-write, or causal tokens
- Which data in a typical app should NOT be eventually consistent, and why? → balances, inventory, uniqueness/auth — stale reads let clients make conflicting decisions