cicd · level 6

Monorepo Pipelines

Affected-only builds, build graphs, remote cache.

200 XP

Monorepo Pipelines

A monorepo's value proposition is simple: one repo, many packages, atomic refactors across them. The cost — if you don't have the right pipeline tooling — is "every PR rebuilds everything", and CI minutes that grow quadratically with team size.

The rebuild-everything trap

The naive monorepo CI:

- run: pnpm install
- run: pnpm test     # runs every test in every package
- run: pnpm build    # builds every package

For 5 packages, that's annoying. For 200 packages, that's a 90-minute CI run on every PR. For 2,000 packages (Google, Meta scale), that's never.

The fix: build only the affected packages — the ones whose source actually changed, plus everything that transitively depends on them.

Build graphs — the core data structure

Every monorepo build tool models packages as a directed graph:

       app-web ───▶ ui-lib ───▶ design-tokens
           │           │              │
           ▼           ▼              ▼
        api-client    css-vars       chroma-utils
           │
           ▼
        types-shared

Nodes are packages; edges are dependencies. When you change design-tokens, the affected set is design-tokens, ui-lib, app-web — every package that transitively depends on it. CI runs build/test only on that subgraph.

The tool computes this from a manifest in each package:

// package.json (Turborepo / pnpm workspaces)
{
  "name": "ui-lib",
  "dependencies": {
    "design-tokens": "workspace:*"
  }
}

Or in Bazel-style explicit-deps form:

# BUILD.bazel
py_library(
    name = "ui_lib",
    srcs = glob(["*.py"]),
    deps = ["//design_tokens:design_tokens_lib"],
)

Bazel-style is stricter: you can't import from another package unless you declare the dep. This is annoying but catches a class of bugs where "it builds in CI but breaks locally" because the implicit dep wasn't reflected in the build graph.

The cache key

Each build tool computes a cache key per package. Roughly:

cache_key(pkg) = hash(
  source_files(pkg),
  cache_key(direct_dependencies)   # recursive!
)

If the source hasn't changed and no dependency's cache key has changed, the cache key is the same — and the previous build output can be reused.

The "transitive" part is critical. If you change design-tokens, every package's cache key transitively changes — they all need to rebuild. If you change app-web, only app-web rebuilds; nothing depends on it.

$ turbo run build --filter=...[main]
Tasks:    23 successful, 23 total
Cached:   18 cached, 23 total           ← 18 hits, 5 actual builds
Time:     12s

Remote cache — sharing builds across machines

The same hash that powers local affected-only also enables a remote cache. Build the package once on your laptop; CI re-uses your build via the remote cache.

turbo build (laptop)        →   write key K1 → output to remote cache
                                                 ↓
turbo build (CI)            →   read key K1 → cache hit, no rebuild
                                                 ↓
turbo build (other laptop)  →   read key K1 → cache hit

The cache stores build outputs (compiled code, test results, built containers). Pulling from a remote cache is an HTTP GET; if it's cheaper than the build itself (it almost always is), you save time.

Tools and their cache offerings:

  • Turborepo: Remote Cache (Vercel-hosted, free up to a quota) or self-hosted via S3.
  • Nx: Nx Cloud (hosted) or self-hosted.
  • Bazel: Remote Build Execution (RBE) protocol; self-hosted (BuildBuddy, BuildGrid) or hosted.
  • Pants: Native remote cache via the gRPC RBE protocol.

When affected-only fails

The mode collapses if any of these hold:

  • Implicit deps: package A imports from B but doesn't declare it. A change in B doesn't invalidate A's cache → CI passes, prod breaks.
  • Filesystem-side-effect deps: tests that read a fixture file that lives outside the package. A change to the fixture doesn't invalidate the test cache.
  • Tool-version drift: the same source builds differently with a different compiler version. Hash the toolchain version into the cache key.
  • External dependencies: an npm package republished under the same version. Lock files mitigate; you must enforce them.

The fix for all four is hermeticity — the build's inputs include EVERYTHING that influences the output. Bazel enforces this strictly; Turborepo and Nx are looser by default but improving.

A real monorepo CI

Here's a typical Turborepo CI:

name: ci
on: [push, pull_request]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with: { fetch-depth: 2 }       # need parent commit for affected detection

      - uses: pnpm/action-setup@v3
      - run: pnpm install --frozen-lockfile

      - name: Lint, build, test affected packages
        env:
          TURBO_TOKEN: ${{ secrets.TURBO_TOKEN }}
          TURBO_TEAM:  my-team
        run: |
          pnpm turbo run lint build test \
            --filter=...[origin/main] \
            --concurrency=10

Three things make this fast:

  1. --filter=...[origin/main] — only run on packages that changed since main.
  2. TURBO_TOKEN enables remote cache lookups → most tasks hit the cache.
  3. --concurrency=10 — pack as many parallel tasks into the runner as it has CPUs.

For a typical PR touching 1-3 packages out of 100, this finishes in under 2 minutes. The same code in a "rebuild everything" monorepo would run for 30+ minutes.

When NOT to use a monorepo

Monorepo tooling has overhead. If:

  • You have fewer than 10 packages.
  • Packages release independently with different versioning policies.
  • You don't need atomic cross-package refactors.

...you might be fine with multi-repo + a private package registry. The crossover point is somewhere around 20-30 packages or 10+ contributors.

Summary

  • Build graphs make monorepo CI feasible. Without them, you rebuild everything.
  • Affected-only runs CI on the subgraph that depends on changed files.
  • Cache keys hash source + transitive deps; matches mean cache hits.
  • Remote caches share build outputs across CI runs and laptops.
  • Hermeticity (Bazel-style explicit deps) prevents "works in CI, breaks locally".
  • For 20+ packages and 10+ developers, the tooling pays for itself in the first month.

Tools in the wild

5 tools
  • Turborepofree tier

    Vercel's lightweight monorepo orchestrator for JS/TS — incremental, remote-cacheable.

    library
  • Nxfree tier

    Polyglot monorepo build system with affected-graph and a hosted remote cache.

    library
  • Bazelfree tier

    Google's build system — language-agnostic, hermetic, remote-build-execution.

    library
  • Pantsfree tier

    Python-first monorepo build system; modern incremental + remote caching.

    library
  • Buck2free tier

    Meta's open-source monorepo build system. Rust-rewritten, blazingly fast.

    library