cicd · level 1

Pipeline Anatomy

A DAG of stages, artifacts, and edges.

150 XP

Pipeline Anatomy

A CI pipeline is a directed acyclic graph (DAG) of jobs. Each node is a job; each edge is a dependency. Jobs without incoming edges run first, in parallel. Jobs with dependencies wait for their upstream nodes to turn green.

Every modern CI system — GitHub Actions, GitLab CI, CircleCI, Buildkite — is a variation on this pattern. The names differ; the graph does not.

Analogy

A pipeline is a car assembly line. Some stations can work in parallel — the doors are being painted over here while the seats are being stitched over there — but the painted door cannot go on until the frame is welded, and the seats cannot go in until the chassis is through the paint booth. The total time to finish a car isn't the sum of every station; it's the longest unbroken chain from raw steel to driving off the lot. Artifacts are the parts crates passed between stations. An "allow failure" station is a QA inspector who waves cars through even when they fail — and after a month, nobody trusts the final inspection either.

The canonical stages

Most pipelines have five logical stages:

Stage What happens Typical duration
checkout Clone the repo at the triggering commit 2–5 s
install Restore or download dependencies 5–60 s (with cache)
build Compile, bundle, generate artifacts 10–120 s
test Run the test suite (unit, integration, e2e) 30 s – 10 min
deploy Ship the artifact to an environment 10 s – 5 min

These stages are not fixed. A monorepo might fan out into a dozen build jobs after install. A library project might skip deploy entirely. The shape is determined by what the project needs.

Dependencies determine the critical path

The total wall-clock time of a pipeline is not the sum of all job durations. It is the longest path through the graph — the critical path.

If build takes 90 seconds and three parallel test shards each take 45 seconds, the critical path is checkout → install → build → shard = 90 + 45 = 135 seconds plus checkout and install. Adding a fourth shard does nothing to the wall time; removing the build bottleneck does.

Understanding the graph lets you reason about where to invest optimisation effort.

Artifacts

Jobs communicate via artifacts: files persisted between stages. The build job produces a compiled bundle; the deploy job consumes it. Without explicit artifact upload and download steps, each job starts from a blank workspace and has no memory of upstream work.

Artifacts cost storage and time to transfer. Be intentional: upload what downstream jobs genuinely need, nothing else.

Failure propagation

When a job fails, all downstream dependents are cancelled or skipped. A checkout failure stops everything. A test failure typically stops deploy. This is correct behaviour: you want the pipeline to stop shipping broken code.

Some jobs are marked continue-on-error or allow_failure. Use this sparingly. A test job marked allow_failure is a test job that no longer enforces quality. The signal decays.

YAML is configuration, not code

Pipeline definitions are declarative YAML (or similar). You describe what should happen, not how the runner executes it. The CI platform reads the YAML, constructs the graph, and schedules jobs onto available runners.

# GitHub Actions — minimal example
jobs:
  install:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pnpm install

  build:
    needs: install
    runs-on: ubuntu-latest
    steps:
      - run: pnpm build

  test:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - run: pnpm test

The needs key is the edge in the graph. install has no needs, so it runs immediately. build waits for install. test waits for build.

The hidden job: waiting for a runner

Elapsed time on CI includes queue time — waiting for a runner to become available. This is invisible in most timing dashboards. On a heavily loaded shared runner pool, a 3-minute pipeline can take 12 minutes wall time. Keep this in mind when evaluating "our pipeline is slow."

What to build next

Once you understand the graph, two levers dominate optimisation:

  1. Parallelism — split work across jobs that can run concurrently.
  2. Caching — avoid redoing expensive work when inputs haven't changed.

The next level covers caching in depth.

Tools in the wild

6 tools
  • YAML workflows triggered by repo events; the dominant CI for OSS and most startups.

    service
  • GitLab CI/CDfree tier

    First-class pipelines built into GitLab — DAG jobs, parent/child pipelines, manual gates.

    service
  • Mature CI with strong macOS + ARM runner support and reusable orbs.

    service
  • SaaS control plane + your own runners — popular at companies with custom hardware needs.

    service
  • Jenkinsfree tier

    Self-hosted CI workhorse; Jenkinsfile DSL still runs huge enterprise pipelines.

    library
  • Daggerfree tier

    Code your pipelines in Go/Python/TS and run them locally or in any CI provider.

    library