git · level 8

Submodules & Subtrees

Vendoring git history — the submodule pain points and the subtree alternative.

175 XP

Submodules & Subtrees

When your repo needs another repo's code inside it, git gives you two options. Both have ardent users. Both have ardent detractors. Knowing when each is right (and when neither is right) saves a year of frustration.

Analogy

A submodule is a forwarding address. Your repo says "go to that other building, room 304, and read the lease that says what version we're pinned to." A subtree is bringing the other building's tenants into your living room — they're now part of your apartment, and the original building doesn't know or care. Submodule rent is cheap (your repo stays small) but every visitor needs the address card and a key to the other building. Subtree rent is steep (your apartment is now bigger) but every visitor walks in and finds everything in one place.

Submodules

A submodule is a pinned reference to another repo. It's stored in your repo as:

A line in .gitmodules (text file you commit) describing the URL and path:

[submodule "vendor/lib"]
    path = vendor/lib
    url = https://github.com/foo/lib.git

A commit SHA pointer in your tree at vendor/lib. Not the files — the SHA.

When you git clone a repo with submodules:

your-repo/
├── .gitmodules
├── src/
└── vendor/
    └── lib/             ← exists, but EMPTY by default

To populate, you need an extra step:

git submodule update --init --recursive
# or, if you remembered at clone time:
git clone --recurse-submodules <url>

This is the first big footgun: forgetting the --recurse-submodules flag means everyone who clones your repo sees half-empty directories. CI fails. Builds fail. Engineers get angry.

How submodule updates work

A submodule is pinned to a specific SHA. To update it:

cd vendor/lib
git fetch
git checkout main
git pull
cd ../..
# now the parent repo sees the submodule directory has moved
git status
# modified:   vendor/lib (new commits)
git add vendor/lib       # adds the NEW SHA pointer
git commit -m "bump vendor/lib to <new-sha>"
git push

After your push, anyone else needs to:

git pull
git submodule update --init --recursive

If they forget the second command, their vendor/lib is still pointing at the old SHA. CI catches this; humans don't always.

Submodule pain points

The honest list:

Detached HEAD inside submodules. When git checks out the pinned SHA, you're in detached HEAD inside the submodule. Making changes there requires switching to a branch first — and people forget.
Shallow clone interactions. CI runs often shallow-clone (--depth 1) for speed. Submodules with shallow parents fail because the pinned SHA isn't reachable in the shallow history.
Permissions / auth duplication. You need credentials for the submodule's host as well as the parent's. SSH keys, deploy tokens, etc.
Two-step commits. Updating a dependency is now two commits across two repos, with the parent's commit only making sense after the child's is pushed.
Newcomer confusion. "Why is this directory empty?" is the most-asked question on every team using submodules. Forever.

These are not deal-breakers — many large projects (Chromium, LLVM, Linux kernel auxiliary modules) use submodules effectively. But they cost ongoing tax.

Subtrees

git subtree takes the opposite approach: bring the other repo's content (and optionally its history) into your repo, as files and commits.

# add vendor/lib by pulling in foo/lib's main branch
git remote add lib https://github.com/foo/lib.git
git subtree add --prefix=vendor/lib lib main --squash
# the --squash flag merges all of lib's history into ONE commit on your branch

After this:

Your repo has the actual files at vendor/lib.
Your history has either one squash commit (with --squash) or all of lib's history (without).
git clone is normal — no --recurse-submodules, no missing files.

To update:

git subtree pull --prefix=vendor/lib lib main --squash

To contribute changes back upstream (rare but possible):

git subtree push --prefix=vendor/lib lib feature-branch

When subtree is the right answer

Clone-and-go is critical. New engineers, CI bots, demo environments — anyone who runs git clone and expects everything to work.
The dependency is small. Adding a small library to your repo doesn't bloat noticeably; a 500MB vendor/ directory does.
You'll rarely modify the dependency separately. If you mostly just want the code at a known version with no fuss, subtree fits.

When subtree is wrong

The dependency changes constantly. Every update is a merge commit (or rebase) into your repo. Noisy.
The dependency is huge. You don't want a 1GB submodule's full history in every clone of your project.
Multiple projects share the same dependency. Subtree means each project has its own copy. Submodule (or a package manager) keeps one source of truth.

When neither is right — the third option

Most modern languages have a real package manager:

Language	Tool	What it does
Go	go modules	Pulls dependency code into a local cache, pinned by version
Rust	Cargo	Same idea — `Cargo.toml` pins, `Cargo.lock` records
Node	npm/pnpm/yarn	`package.json` + `package-lock.json`
Python	uv/pip	`pyproject.toml` + `uv.lock` / `requirements.txt`
Ruby	Bundler	`Gemfile.lock`

If your dependency has a package manifest in any of these systems, use the package manager. It handles versioning, transitive dependencies, security advisories, license tracking, and clone-and-go simplicity. Submodules and subtrees are for when the dependency is NOT packaged (a private fork, a not-yet-released library, a tool with no package).

For monorepos that contain unpublished packages, internal tooling like Bazel, Buck, or Pants is the right escalation — they handle internal package versioning at the build-tool level.

A decision matrix

Situation	Best tool
Standard library/language with a package manager	Use the package manager
Internal library, frequent updates, multiple consumers	Bazel/Buck/Pants OR a private package registry
External repo, infrequent updates, one consumer	Subtree (with `--squash`)
External repo, frequent updates, you want explicit version pins	Submodule
Vendored fork of a public lib that you sometimes patch	Subtree
Independently-versioned third party (Chromium-style)	Submodule (with discipline)

Diagnostic commands

Submodule status:

git submodule status
# +abc1234 vendor/lib (heads/main)        ← + means out-of-sync with checked-out
#  abc1234 vendor/lib (heads/main)        ← in sync
# -abc1234 vendor/lib                     ← - means not initialized

CI sanity check — after clone, before build:

git submodule status | grep -E "^[+-]" && echo "Submodules out of sync" && exit 1

This catches the "forgot to update submodules" bug early.

A common pattern that works

Many production teams settle on:

Application code — single repo, no submodules.
Internal libraries — published to an internal package registry (npm, pypi, gems, go modules), pinned by version in the consumer.
Vendored forks of external libs — subtree if small, submodule if large or independently versioned.
Truly massive shared assets — git LFS, not submodule.

Stick to packages for code, subtree as the rare exception. Submodules only when nothing else fits.

What to internalise

Submodule = SHA pointer to another repo's commit. Subtree = files merged into your repo.
--recurse-submodules is non-negotiable when working with submodule-bearing repos.
Most dependency problems are better solved by a real package manager than by either submodules or subtrees.
Subtree wins on clone-and-go; submodule wins on disk size and explicit version pinning.
"We use submodules" should always be a discussed decision, never a default.

Tools in the wild

4 tools

git submodulefree tier
Built-in. The classic vendoring mechanism, footguns and all.
cli
git subtreefree tier
Built-in. The 'just merge it in' alternative.
cli
Bazelfree tier
Build tool that handles vendored deps via http_archive — sidesteps git entirely.
cli
go modulesfree tier
Language-level dependency vendoring — what most modern languages do instead of submodules.
spec