cloud · level 2

Storage Tiers

Object vs block vs file — pick the right shape.

200 XP

Storage Tiers

The cloud gives you three fundamentally different storage shapes. Picking the wrong one costs you either money or correctness — sometimes both. There is no universal "best"; the shape of the workload decides.

Analogy

Think of three ways to store your belongings at home. Block storage is the top drawer of your desk — instantly reachable, you can grab a pen mid-sentence, but only you have a key and the drawer only fits so much. File storage is the shared filing cabinet in a small office — several colleagues can open it at once and browse folders, but everyone is reaching into the same cabinet so it slows down when the room is crowded. Object storage is a self-storage warehouse across town — effectively infinite, cheap per square foot, and each unit has a unique address, but you don't pop in to grab a sock; you drive out with a list.

The three shapes

Shape What it is Example services Access pattern
Object A flat keyspace of immutable blobs S3, GCS, Azure Blob Whole-object PUT/GET over HTTPS
Block A raw attached disk of fixed-size blocks EBS, Persistent Disk, Azure Managed Disk Read/write arbitrary offsets, one mount
File A shared POSIX filesystem EFS, Filestore, Azure Files Many clients read/write concurrently

Pick object when the unit of work is a whole file. Pick block when you need a filesystem on one VM. Pick file when many VMs need the same mount.

Object storage

The S3 mental model: a giant key-value store where keys look like paths and values are blobs up to 5 TB. Keys have no directory hierarchy — the / is just another character.

  • Cheapest per GB. Standard tier ~$0.023/GB-month. Glacier Deep Archive is ~$0.00099/GB-month (retrieval hours to restore).
  • Strong read-after-write consistency for PUT→GET of the same key, and for DELETE→GET returns the deletion.
  • Eventual consistency for bucket listings.
  • High tail latency. P99 is in the hundreds of milliseconds. No good for random low-latency I/O.
  • Scales horizontally. No provisioning. Throughput per prefix is capped; for extreme throughput, spread writes across prefixes.

Workloads that fit: logs, backups, media, data lakes, static website assets, ML training data.

Block storage

EBS gives you a raw disk you attach to one EC2 instance. The instance owns the filesystem (ext4, XFS, NTFS) on top. You get consistent low-latency IOPS.

  • Volume types matter. gp3 is the default — 3,000 IOPS baseline, 125 MB/s throughput, provisioned higher on demand. io2 Block Express goes up to 256k IOPS. st1/sc1 are throughput-optimised HDDs for cold data.
  • One instance at a time. Multi-Attach exists for io2 but is a specialist tool.
  • Snapshots are incremental and stored in S3. Restore creates a fresh volume, which lazy-loads blocks on first read.
  • Root volumes for EC2 and Kubernetes node groups are block.

Workloads that fit: database data + WAL, single-instance filesystems, root volumes, cache tiers that need consistent p99 latency.

File storage

EFS gives you a POSIX filesystem you can mount from many clients at once over NFS. The filesystem scales automatically; you pay for storage + throughput.

  • Concurrent access is the whole point. Fifty Lambdas, an auto-scaling group, and a Jenkins runner can all mount the same directory.
  • More expensive than S3, cheaper than EBS. ~$0.30/GB-month Standard; Infrequent Access tier is ~$0.016/GB-month.
  • Throughput modes: bursting (based on size), provisioned (fixed), elastic (scales automatically, highest cost).

Workloads that fit: shared content roots, CI/CD build caches shared across runners, lift-and-shift legacy apps that expect NFS, ML training datasets accessed by many jobs.

Pricing deltas that matter

  • Object is cheapest per GB and scales to petabytes. Cost dominated by storage + request charges + egress.
  • Block is the most expensive per GB but has no per-request fee after provisioning.
  • File sits in the middle and bills extra for throughput.

The trap: treating EBS like a CDN. Serving static assets from EBS-backed EC2 is ~30× more expensive than S3+CloudFront at scale. Also: egress to the internet is the dominant cost line for many workloads.

Consistency models at a glance

  • S3: strong for PUT→GET on the same key; eventual for list operations.
  • EBS: strong (it's a single attached disk).
  • EFS: close-to-open consistency like NFSv4 — a writer's changes become visible to other clients after it closes the file.

The decision tree

  1. Many writers need the same mount → file.
  2. One writer, need a filesystem → block.
  3. Unit of work is a whole file or blob → object.

Start from the workload, not the service. If you find yourself asking "how do I mount S3 as a filesystem?" you probably picked the wrong tier for the job.