cicd · level 2

CI Caching

What to hash, what to store, when to bust.

150 XP

CI Caching

The most expensive operation in a CI pipeline is the one you repeat unnecessarily. npm install on a cold runner downloads hundreds of packages from the registry, extracts archives, and runs lifecycle scripts. On a warm cache hit, it restores a single tar archive — orders of magnitude faster.

Caching is a content-addressed store: the key determines which cached entry to restore. Get the key right and you win. Get it wrong and you pay for stale restorations or compulsory misses.

Analogy

A cache is a labelled meal-prep container in the fridge. Sunday afternoon you chop onions, roast chicken, and cook a pot of rice. Monday through Friday you open the container marked "chicken bowls, week of Jan 8" and reheat — no shopping, no chopping, no boiling. The cache key is the label: get it wrong and you pull out last month's leftovers on a Tuesday. The trick is labelling by ingredients, not by date. If the grocery receipt (the lock file) is unchanged, last week's prep is still good. If you swapped chicken for tofu, the label changes and you prep fresh. A freezer full of unlabeled tubs is worse than no meal prep at all — you spend longer unsealing each one than you would have spent cooking.

What to cache

Not everything is worth caching. The candidates are:

What Why Invalidate when
node_modules Most of pipeline time is install package-lock.json changes
Package manager cache (~/.npm, ~/.pnpm-store) Faster than node_modules restore on partial changes Rarely — leave TTL-based expiry
Build outputs Skip rebuild when source unchanged Any source file changes
Test fixture data Large blobs downloaded at test time Fixture version changes

Avoid caching things that include timestamps, absolute paths, or nondeterministic content. Caches are shared across many runs; a poisoned cache corrupts all of them.

Cache keys

A cache key is a string computed at job start. The CI system looks up the key in a content-addressed store, restores the matching entry if found, and falls back to a restore-keys prefix match otherwise.

The critical rule: include package-lock.json (or your lock file) in the cache key for node_modules.

Here is why. package.json lists "react": "^19.0.0". The lock file pins "react": "19.2.4". If you key on package.json alone, a new dev dependency (different package.json) could miss the cache when the installed versions haven't actually changed, and more dangerously, a changed package.json with a compatible version bump can produce a cache hit that installs the wrong versions.

# GitHub Actions — correct cache key
- uses: actions/cache@v4
  with:
    path: ~/.pnpm-store
    key: pnpm-${{ runner.os }}-${{ hashFiles('pnpm-lock.yaml') }}
    restore-keys: |
      pnpm-${{ runner.os }}-

hashFiles('pnpm-lock.yaml') produces a SHA of the lock file's contents. It changes if and only if a dependency version changes. Any run with the same lock file on the same OS gets the same cache entry.

Hit rate vs. invalidation rate

There is a tension: a key that invalidates frequently gives accurate caches but poor hit rates. A key that invalidates rarely gives good hit rates but risks stale content.

For dependencies: invalidate on lock file change only. Lock files change on explicit pnpm add / rm — not on every commit. Hit rates should be 80–95% on active projects.

For build outputs: you usually want to invalidate more aggressively — on any change to source files. Use a glob hash:

key: build-${{ hashFiles('src/**/*.ts', 'tsconfig.json') }}

Cache poisoning

If a developer force-pushes a commit that installs malicious code, and that run warms the cache, subsequent clean runs can restore the poisoned node_modules. Mitigations:

  • Pin to exact versions in the lock file (pnpm install --frozen-lockfile).
  • Audit dependencies with pnpm audit in the pipeline.
  • Scope the cache to the branch where security matters most.

Cache size matters

Large caches (> 500 MB) take significant time to save and restore. If your node_modules is 2 GB, restoring it from cache may take longer than running pnpm install --prefer-offline. Measure.

The platform-level package manager cache (e.g., ~/.pnpm-store) is usually smaller and faster to restore than the full node_modules directory because pnpm hard-links from the store on install, making the install step fast even from scratch.

Fallback strategy

Always provide restore-keys with progressively shorter prefixes. If the exact key misses — perhaps the lock file just changed — a partial restore of the prior cache saves the unchanged packages. The install step then fetches only the delta.

restore-keys: |
  pnpm-ubuntu-latest-
  pnpm-ubuntu-
  pnpm-

The pipeline degrades gracefully: a full miss is rare; a partial hit is common.

Tools in the wild

6 tools
  • actions/cachefree tier

    GitHub-hosted dependency cache; key by lockfile hash, restore by prefix.

    library
  • Vercel-hosted task-output cache that's shared across CI + every developer's laptop.

    service
  • Distributed cache + task execution for Nx monorepos; fan out across machines.

    service
  • BuildKitfree tier

    Docker's modern builder — cache mounts, layer reuse, parallel multi-stage builds.

    library
  • sccachefree tier

    Mozilla's compiler cache for C/C++/Rust; backs to S3/GCS for shared CI caches.

    cli
  • Hermetic build cache that scales to massive monorepos; reuse outputs across teams.

    library