databases · level 5

Scaling

Read replicas, sharding, pools, and caches — when one server is not enough.

225 XP

Scaling

A single database server has limits: CPU, RAM, disk I/O, and network bandwidth. Real applications eventually hit at least one. Scaling is the set of techniques you reach for when they do — in roughly the order you should try them.

Analogy

A single database is like one post office branch in a growing town. At first the one clerk handles everything; eventually the line is out the door. Read replicas are satellite branches that copy the master records overnight — most people just want to check whether a parcel has arrived (a read), and those branches handle it fine. A connection pool is the row of numbered PO boxes along the wall: a fixed set of slots that customers rent and reuse instead of forming a new queue at the counter every time. A cache is the "Will Call" shelf behind the desk where the same five frequently-requested items sit pre-pulled. Sharding is giving up on one branch entirely — splitting the town into zip codes, each with its own post office handling its own mail.

The latency numbers you should know

Operation Approximate latency
L1 cache read 0.5 ns
L2 cache read 4 ns
RAM read 100 ns
SSD random read 100 µs
Network round trip (same datacenter) 500 µs
Database query (simple, indexed) 1–5 ms
Network round trip (across region) 10–100 ms

These numbers explain why caching works: a cache hit is memory (100 ns) instead of database (1–5 ms) — three to four orders of magnitude faster.

Read replicas

Most applications read far more than they write — 95% reads is common. A read replica is a copy of the primary database that receives all writes via replication and serves reads directly. The primary handles writes and replicates to replicas.

Benefits: read throughput scales horizontally, the primary is less loaded. Limits: replicas may lag the primary by milliseconds. Reads from replicas can be stale. Do not route balance checks or payment confirmation reads to a replica.

Sharding (horizontal partitioning)

Sharding splits a large table across multiple database nodes. Each node holds a shard — a subset of rows.

Range sharding

Rows are split by a key range: user IDs 1–1M go to shard 1, 1M–2M to shard 2. Simple to reason about. Hot spots form if one range gets disproportionate traffic.

Hash sharding

shard = hash(user_id) % num_shards. Distributes load evenly. Range queries are inefficient — you must query all shards and merge.

Consistent hashing

A ring-based mapping of keys to nodes. Adding or removing a node only remaps a fraction of keys (1/n of the keyspace), not all of them. Essential for distributed caches (Redis Cluster, Cassandra).

Directory sharding

A lookup table maps each key to the shard that holds it. Maximum flexibility, but the directory itself becomes a bottleneck and a single point of failure.

Connection pools

Every open database connection consumes memory and file descriptors — on PostgreSQL, roughly 5–10 MB per connection. An application that opens a new connection per HTTP request can exhaust the database at a few hundred concurrent users.

A connection pool maintains a fixed number of long-lived connections and loans them to application requests. Common choices:

Tool Language/Context
PgBouncer PostgreSQL sidecar proxy
Pgpool-II Full proxy with load balancing
Prisma's pool Built into the Prisma ORM
HikariCP JVM
asyncpg built-in pool Python async

Pool modes: session (one connection per client for the session), transaction (connection released after each commit), statement (released after each statement — incompatible with multi-statement transactions).

Caching layers

A cache stores query results in memory, serving repeated reads without touching the database.

Common patterns:

  • Cache-aside: application checks cache first, falls back to database, writes result to cache.
  • Write-through: writes go to cache and database simultaneously.
  • Read-through: cache handles the fallback transparently.

TTL (time-to-live) controls freshness. Short TTL = fresher data, more cache misses. Long TTL = stale data risk.

Redis and Memcached are the dominant choices. Redis supports richer data structures (sorted sets, pub/sub) and persistence; Memcached is simpler and faster for pure key-value.

When to reach for what

Problem First tool to try
Too many reads Read replica
Rows too numerous for one node Sharding
Too many connections Connection pool
Same data queried repeatedly Cache
Write throughput limited Sharding or write batching

Scale vertically first (bigger machine). Horizontal scaling adds operational complexity. Each step up is a trade-off: you gain throughput and lose simplicity.

The playground

Design a scaling plan for a seeded read-heavy application. Enable or disable components (replica, pool, cache, shard) and see how the architecture diagram morphs and how the validator scores your choices for that workload profile.