Managed Databases
RDS vs Aurora vs DynamoDB vs Spanner — and what you actually buy.
Managed Databases
A managed database is just someone else's on-call. You give up some control over the engine, the kernel, the disk; you get back not-being-paged at 3am when storage fails over. For most workloads, that trade is correct.
Analogy
Self-managing a database is owning a building's boiler. You're responsible for the burner, the pump, the pressure valve, the annual inspection, the spare parts cupboard, and the call to the boiler engineer at midnight when the heat goes out. A managed database is leasing a flat with the heating included. You give up the ability to fine-tune the burner; you also give up most of the reasons it would ever wake you up. Either is reasonable. Most people who insist on owning the boiler haven't actually had a heating failure at midnight in January.
The decision tree
The first axis is the shape of your data:
- Mostly relational, joins matter, transactions across tables → RDS / Aurora / Spanner
- Key-value or single-row lookups, sub-10ms p99 at any scale → DynamoDB / Bigtable
- Columnar analytics, terabyte aggregations, BI workloads → Redshift / BigQuery / Snowflake
- In-memory cache, sub-ms reads, ephemeral by design → ElastiCache / MemoryDB
The second axis is the scale and consistency requirements:
- Single-region, single-AZ — RDS Single-AZ is fine.
- Single-region, HA — RDS Multi-AZ or Aurora.
- Multi-region read replicas with eventual consistency — Aurora Global, DynamoDB Global Tables.
- Global strong consistency with transactions — Spanner. AWS doesn't have a true equivalent.
RDS vs Aurora
Both are managed Postgres / MySQL. The architectural difference is the storage layer.
RDS is conventional: a single EBS volume per instance. The standby in Multi-AZ has its own EBS volume; replication is at the SQL level. Failover means promoting the standby (60–120 seconds typical).
Aurora decouples compute from storage. Every Aurora cluster has a single, shared, 6-way-replicated storage volume across 3 AZs. The primary writes to it; up to 15 replicas read directly from it. Failover is sub-30s because there's no replication catch-up — replicas already see the same volume.
| Feature | RDS | Aurora |
|---|---|---|
| Engine | Postgres / MySQL / others | Postgres / MySQL only |
| Replicas | up to 5 read replicas | up to 15 |
| Storage replication | per-instance EBS, async to standby | shared volume, 6-way across 3 AZs |
| Failover | 60–120s | <30s |
| Storage cost | gp3 pricing | per-GB-month + per-IO |
| Read scaling | replicas have replication lag | replicas read same volume; lag in single-digit ms |
When to pick RDS: you want exact engine-version control, broad engine support, simpler pricing for low-throughput workloads. When to pick Aurora: you need fast failover, many read replicas, or you'll grow into Aurora Global / Aurora Serverless.
DynamoDB
The model is brutally simple:
- Partition key (hash) — required.
- Sort key (range) — optional.
- Anything else is just attribute on the item.
Every read is one of:
GetItem(pk, sk)— single-item lookup.Query(pk, sk-condition)— items within one partition.Scan— table scan, paginated. Don't, basically.
You design the schema for access patterns, not for normalisation. The single-table design pattern (everything in one table, multiple sort-key prefixes per item) is normal and correct.
Capacity modes
- Provisioned — you declare RCU / WCU; you're billed for them whether you use them or not. Auto-scaling can flex.
- On-demand (PAY_PER_REQUEST) — billed per request, no provisioning. ~7× more expensive per request, but zero idle cost. The right default for unpredictable workloads.
Global tables and indexes
- GSI (Global Secondary Index) — alternate access patterns with their own pk/sk. Eventual consistency from base table.
- LSI (Local Secondary Index) — alternate sort key, same pk. Created at table creation; rarely worth the rigidity.
- Global tables — multi-region active-active with last-writer-wins. Sub-second cross-region replication.
When to pick DynamoDB: high-traffic key-value workloads, unpredictable scale, and IAM-integrated auth. Don't pick it for join-heavy relational queries.
Spanner — globally consistent SQL
Spanner is unique. Most "global" databases (Aurora Global, DynamoDB Global Tables) replicate eventually. Spanner replicates synchronously, with global ordering, using TrueTime — a synchronised-clock service that bounds the uncertainty in "now" to a few ms. The result: you get SQL transactions with external consistency (any observer sees ordered, linearised commits) across regions.
The trade-off: write latency is the inter-region round-trip plus TrueTime overhead. Single-region writes are 5–10ms typical; multi-region writes are 100ms+.
Spanner is the right choice when: you genuinely need globally-consistent SQL transactions, the cost is justified, and you're already on GCP.
Read replicas — the always-relevant feature
Three reasons read replicas exist:
- Spread read load. A primary that's CPU-bound on reads benefits hugely from offloading.
- Reporting / analytics without hitting the OLTP primary.
- Disaster recovery — a replica in a different region is your warm DR.
Things to know:
- Replication lag is real. Most workloads tolerate sub-second; some don't. Critical reads should go to the primary.
- Failover promotion is one-way. Once you promote the replica, it's a new primary; the old one needs to be re-bootstrapped.
- Pin reads explicitly. Most ORMs/connection libraries can route by query type. Don't rely on the database doing it.
Multi-AZ vs Multi-region
Two different DR postures:
- Multi-AZ — replicas in different AZs of the same region. Survives an AZ failure. Latency-cheap (a few ms cross-AZ). Default for production.
- Multi-region — replicas in different regions. Survives a region failure. 50–200ms cross-region latency. Expensive.
Most production systems run Multi-AZ from day one and add Multi-region replication only when:
- The compliance / business-continuity requirement explicitly demands it.
- Cross-region read traffic latency justifies replicas closer to users.
- You've already nailed Multi-AZ and your runbooks include cross-region failover drills.
The shortcut "multi-region everything" is almost always premature.
When the managed database is wrong
Three signals:
- Latency-sensitive hot loops. A self-managed engine on EC2 / GKE with hand-tuned kernel + io_uring + nvme can beat RDS by ~20% for the right shape of workload.
- Engine version pinning. RDS lets you pick "MySQL 8.0", but the minor version can roll forward. Some workloads (regulated finance, payment processors) want byte-identical engine versions across years.
- Custom extensions. RDS Postgres only allows extensions that AWS approves. If you need
pg_cronflavoured a specific way, ortimescaledbon a specific minor version, sometimes self-managed wins.
For 95% of workloads, the managed service is the right call. The remaining 5% justifies the 24/7 on-call.
A diagnostic checklist
When considering a managed database:
- What's the access pattern? (KV, relational, analytical, full-text?)
- What's the traffic shape? (Steady, spiky, scale-to-zero?)
- What's the latency budget? (Sub-ms? Sub-10ms? Sub-100ms?)
- What's the consistency requirement? (Eventual, read-after-write, linearisable?)
- What's the data size? (GB, TB, PB?)
- Single-region or multi-region?
Answer those six and the choice is usually obvious. Skip them and you'll end up with a relational schema in DynamoDB or a key-value workload chewing through Aurora I/O.
Tools in the wild
6 tools- service
Managed Postgres/MySQL/MariaDB/Oracle/SQL Server. Multi-AZ replication.
- service
Postgres/MySQL with shared storage layer; 6-way replication, fast failover.
- service
Serverless NoSQL key-value + document store; sub-10ms p99 at any scale.
- service
Globally-distributed strongly-consistent SQL. TrueTime-backed transactions.
- service
Managed Redis / Memcached; MemoryDB is durable, multi-AZ Redis.
- service
Columnar MPP data warehouse for petabyte-scale analytics.