CI Caching
What to hash, what to store, when to bust.
CI Caching
The most expensive operation in a CI pipeline is the one you repeat unnecessarily. npm install on a cold runner downloads hundreds of packages from the registry, extracts archives, and runs lifecycle scripts. On a warm cache hit, it restores a single tar archive — orders of magnitude faster.
Caching is a content-addressed store: the key determines which cached entry to restore. Get the key right and you win. Get it wrong and you pay for stale restorations or compulsory misses.
Analogy
A cache is a labelled meal-prep container in the fridge. Sunday afternoon you chop onions, roast chicken, and cook a pot of rice. Monday through Friday you open the container marked "chicken bowls, week of Jan 8" and reheat — no shopping, no chopping, no boiling. The cache key is the label: get it wrong and you pull out last month's leftovers on a Tuesday. The trick is labelling by ingredients, not by date. If the grocery receipt (the lock file) is unchanged, last week's prep is still good. If you swapped chicken for tofu, the label changes and you prep fresh. A freezer full of unlabeled tubs is worse than no meal prep at all — you spend longer unsealing each one than you would have spent cooking.
What to cache
Not everything is worth caching. The candidates are:
| What | Why | Invalidate when |
|---|---|---|
node_modules |
Most of pipeline time is install | package-lock.json changes |
Package manager cache (~/.npm, ~/.pnpm-store) |
Faster than node_modules restore on partial changes |
Rarely — leave TTL-based expiry |
| Build outputs | Skip rebuild when source unchanged | Any source file changes |
| Test fixture data | Large blobs downloaded at test time | Fixture version changes |
Avoid caching things that include timestamps, absolute paths, or nondeterministic content. Caches are shared across many runs; a poisoned cache corrupts all of them.
Cache keys
A cache key is a string computed at job start. The CI system looks up the key in a content-addressed store, restores the matching entry if found, and falls back to a restore-keys prefix match otherwise.
The critical rule: include package-lock.json (or your lock file) in the cache key for node_modules.
Here is why. package.json lists "react": "^19.0.0". The lock file pins "react": "19.2.4". If you key on package.json alone, a new dev dependency (different package.json) could miss the cache when the installed versions haven't actually changed, and more dangerously, a changed package.json with a compatible version bump can produce a cache hit that installs the wrong versions.
# GitHub Actions — correct cache key
- uses: actions/cache@v4
with:
path: ~/.pnpm-store
key: pnpm-${{ runner.os }}-${{ hashFiles('pnpm-lock.yaml') }}
restore-keys: |
pnpm-${{ runner.os }}-
hashFiles('pnpm-lock.yaml') produces a SHA of the lock file's contents. It changes if and only if a dependency version changes. Any run with the same lock file on the same OS gets the same cache entry.
Hit rate vs. invalidation rate
There is a tension: a key that invalidates frequently gives accurate caches but poor hit rates. A key that invalidates rarely gives good hit rates but risks stale content.
For dependencies: invalidate on lock file change only. Lock files change on explicit pnpm add / rm — not on every commit. Hit rates should be 80–95% on active projects.
For build outputs: you usually want to invalidate more aggressively — on any change to source files. Use a glob hash:
key: build-${{ hashFiles('src/**/*.ts', 'tsconfig.json') }}
Cache poisoning
If a developer force-pushes a commit that installs malicious code, and that run warms the cache, subsequent clean runs can restore the poisoned node_modules. Mitigations:
- Pin to exact versions in the lock file (
pnpm install --frozen-lockfile). - Audit dependencies with
pnpm auditin the pipeline. - Scope the cache to the branch where security matters most.
Cache size matters
Large caches (> 500 MB) take significant time to save and restore. If your node_modules is 2 GB, restoring it from cache may take longer than running pnpm install --prefer-offline. Measure.
The platform-level package manager cache (e.g., ~/.pnpm-store) is usually smaller and faster to restore than the full node_modules directory because pnpm hard-links from the store on install, making the install step fast even from scratch.
Fallback strategy
Always provide restore-keys with progressively shorter prefixes. If the exact key misses — perhaps the lock file just changed — a partial restore of the prior cache saves the unchanged packages. The install step then fetches only the delta.
restore-keys: |
pnpm-ubuntu-latest-
pnpm-ubuntu-
pnpm-
The pipeline degrades gracefully: a full miss is rare; a partial hit is common.
Tools in the wild
6 tools- libraryactions/cachefree tier
GitHub-hosted dependency cache; key by lockfile hash, restore by prefix.
- serviceTurborepo Remote Cachefree tier
Vercel-hosted task-output cache that's shared across CI + every developer's laptop.
- service
Distributed cache + task execution for Nx monorepos; fan out across machines.
- libraryBuildKitfree tier
Docker's modern builder — cache mounts, layer reuse, parallel multi-stage builds.
- clisccachefree tier
Mozilla's compiler cache for C/C++/Rust; backs to S3/GCS for shared CI caches.
- libraryBazel Remote Cachefree tier
Hermetic build cache that scales to massive monorepos; reuse outputs across teams.