Storage & Volumes
PVCs, CSI drivers, StatefulSets — the data lives outside the pod.
Storage & Volumes
The fundamental rule: a container's filesystem dies with the container. Anything you write inside a container is gone the moment the kubelet restarts it. State has to live somewhere else, on a storage object that outlives the pod.
Analogy
A pod is a hotel room. You can sleep there, work there, eat there, but the housekeeping crew comes between guests and resets it. If you want anything to persist, you store it in the safe in the wall — that's the volume. The safe stays bolted to the building even when the guest changes; the room itself gets re-keyed every time someone new checks in. Everyone learns this the first time they kubectl delete pod and watch their database disappear.
The hierarchy: Volume → PV → PVC
Three layers, often confused:
- Volume — the abstract handle inside a Pod spec. "Here's a directory with some bytes."
- PersistentVolume (PV) — a cluster-level storage resource: "this 100 GB EBS volume exists at this AWS ID."
- PersistentVolumeClaim (PVC) — a user's request for storage: "I need 50 GB, RWO, fast SSD." Kubernetes binds it to an available PV.
Pods reference PVCs, not PVs directly. PVCs decouple your manifest from the underlying storage system — your manifest says "I need 50 GB", and the cluster's storage class controller provisions the actual EBS volume / Ceph RBD / Azure Disk.
Access modes
The four access modes, named with the awkward Kubernetes shortcuts:
| Mode | Meaning | Typical backing |
|---|---|---|
RWO |
ReadWriteOnce — one node mounts at a time | EBS, Azure Disk, GCE PD |
ROX |
ReadOnlyMany — many nodes mount, read-only | EFS in read-only mode |
RWX |
ReadWriteMany — many nodes mount, all read+write | EFS, NFS, CephFS |
RWOP |
ReadWriteOncePod — exactly one pod (1.22+) | Same as RWO but stricter |
Block storage (EBS, Azure Disk, GCP PD) is fast but RWO. Network file systems (EFS, NFS, CephFS) are shareable but slower. Pick by access pattern: a Postgres replica needs RWO; a shared upload directory between five web pods needs RWX.
StatefulSet vs Deployment
The mistake everyone makes once: running a database as a Deployment.
- Deployment: stateless replicas. Any pod can serve any request. Names are random hashes (
web-7d4b6f-x9k2p). Roll-out is parallel. - StatefulSet: stateful replicas with identity. Pod names are predictable (
postgres-0,postgres-1). Each pod has its own PVC. Roll-out is serial. Termination is reverse-serial.
If a workload needs:
- Stable hostname (
postgres-0.postgres.default.svc.cluster.local), or - Per-pod persistent storage (its own EBS volume), or
- Ordered startup (must come up after primary), or
- Ordered shutdown,
it's a StatefulSet. Otherwise it's a Deployment.
Per-pod storage with volumeClaimTemplates
The killer StatefulSet feature: every pod automatically gets its own PVC.
volumeClaimTemplates:
- metadata: { name: data }
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: gp3
resources:
requests: { storage: 100Gi }
Three replicas → three PVCs (data-postgres-0, data-postgres-1, data-postgres-2). Pod restart? Same name, same PVC, same data. Pod re-scheduled to another node? Storage follows.
CSI — the plugin interface
Long ago, every storage backend was hard-coded into the kubelet. Then CSI (Container Storage Interface) standardised it: any vendor implements a CSI driver, and Kubernetes consumes it without modification.
Notable CSI drivers:
aws-ebs-csi-driver— EBS-backed PVCs.aws-efs-csi-driver— EFS-backed PVCs (RWX!).azuredisk.csi.azure.com,azurefile.csi.azure.com— Azure equivalents.pd.csi.storage.gke.io— GCP Persistent Disks.rook-ceph— self-managed distributed storage.csi.cilium.io,openebs.csi.openebs.io— niche but production.
You install one as a DaemonSet (one driver pod per node). It registers a StorageClass. Your PVC references the StorageClass. The driver handles the rest.
Storage classes
A StorageClass is "this is how to provision storage of this kind". On AWS:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata: { name: gp3 }
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "3000"
throughput: "125"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Delete
Two flags worth understanding:
volumeBindingMode: WaitForFirstConsumer— don't provision the EBS volume until a pod is scheduled. Lets the scheduler pick a node that matches the volume's AZ. Without this, you get scheduling failures because EBS is zonal.reclaimPolicy: RetainvsDelete— what happens when the PVC is deleted.Retainkeeps the underlying volume around (manual cleanup);Deleteactually destroys the EBS. Production-data →Retain.
Ephemeral volumes worth knowing
Not all volumes need to persist:
emptyDir— temporary scratch space for the pod's lifetime. Gone on pod deletion. Faster than a PVC; safe for caches.configMap/secret— read-only mounts of cluster-stored config / secrets.projected— mount multiple sources (configMap + secret + downwardAPI) into one directory.hostPath— mount a path from the node. Use sparingly; ties pod to node.csi.ephemeral— driver-managed, tied to pod lifetime.
The most common reach is emptyDir:
volumes:
- name: scratch
emptyDir:
sizeLimit: 5Gi
medium: Memory # tmpfs — RAM-backed, blazing fast
EBS-backed PVCs vs EFS / NFS
Two big classes, with different tradeoffs:
| Property | EBS / GCP PD / Azure Disk | EFS / NFS / CephFS |
|---|---|---|
| Access mode | RWO only | RWX |
| Latency | Sub-millisecond | Single-digit ms |
| Throughput | Up to 4 GB/s (gp3 + tuning) | Lower per-mount; scales horizontally |
| Cost | Cheap per-GB | More expensive per-GB |
| Multi-AZ | No (zonal) | Yes |
| Use cases | Databases, single-writer apps | Shared uploads, ML datasets |
If you have to pick one default for a generic Kubernetes cluster, EBS-class block storage with RWO + StatefulSet is the right starting point. Add EFS only when you genuinely have a shared-write requirement.
Worth knowing about
- VolumeSnapshots — point-in-time snapshots of a PVC, exposed as a Kubernetes object. Driver-dependent.
- Volume expansion — many CSI drivers can grow a PV in place (
allowVolumeExpansion: true). Edit the PVC'sresources.requests.storageand the driver resizes. - Velero — cluster backup/restore. PVs included. The right answer for "I need DR for my Kubernetes state."
- Local Persistent Volumes — for high-IOPS workloads that can tolerate node-stickiness (e.g. Cassandra, Kafka). Trade portability for performance.
A diagnostic loop
When a pod can't mount its volume:
kubectl get pvc— is the PVCBound? IfPending, the storage class isn't provisioning.kubectl describe pvc <name>— events explain why provisioning failed.kubectl describe pod <name>— at the bottom,FailedAttachVolume/Multi-Attach errorare common: the volume is still attached to a previous node.kubectl get pv— is the PV in the right zone? EBS volumes are zonal; if the pod is inus-east-1band the PV inus-east-1a, attachment fails.- CSI driver logs —
kubectl logs -n kube-system -l app=ebs-csi-controllerfor the driver-level error.
The most common production bug is "Multi-Attach error" — a pod gets rescheduled while the previous one is still in Terminating. Wait for the previous pod to fully die, or set terminationGracePeriodSeconds: 30 on critical workloads to avoid the race.
Tools in the wild
5 tools- libraryAWS EBS CSI driverfree tier
EBS-backed PVCs for Kubernetes — RWO block storage.
- libraryAWS EFS CSI driverfree tier
EFS-backed PVCs — RWX, multi-AZ NFS shares.
- libraryRook + Cephfree tier
Self-managed distributed storage — RWO + RWX inside the cluster.
- serviceVelerofree tier
Backup + restore for Kubernetes resources and PV snapshots.
- libraryOpenEBSfree tier
Local-PV and replicated CAS for self-hosted Kubernetes.