Submodules & Subtrees
Vendoring git history — the submodule pain points and the subtree alternative.
Submodules & Subtrees
When your repo needs another repo's code inside it, git gives you two options. Both have ardent users. Both have ardent detractors. Knowing when each is right (and when neither is right) saves a year of frustration.
Analogy
A submodule is a forwarding address. Your repo says "go to that other building, room 304, and read the lease that says what version we're pinned to." A subtree is bringing the other building's tenants into your living room — they're now part of your apartment, and the original building doesn't know or care. Submodule rent is cheap (your repo stays small) but every visitor needs the address card and a key to the other building. Subtree rent is steep (your apartment is now bigger) but every visitor walks in and finds everything in one place.
Submodules
A submodule is a pinned reference to another repo. It's stored in your repo as:
- A line in
.gitmodules(text file you commit) describing the URL and path:[submodule "vendor/lib"] path = vendor/lib url = https://github.com/foo/lib.git - A commit SHA pointer in your tree at
vendor/lib. Not the files — the SHA.
When you git clone a repo with submodules:
your-repo/
├── .gitmodules
├── src/
└── vendor/
└── lib/ ← exists, but EMPTY by default
To populate, you need an extra step:
git submodule update --init --recursive
# or, if you remembered at clone time:
git clone --recurse-submodules <url>
This is the first big footgun: forgetting the --recurse-submodules flag means everyone who clones your repo sees half-empty directories. CI fails. Builds fail. Engineers get angry.
How submodule updates work
A submodule is pinned to a specific SHA. To update it:
cd vendor/lib
git fetch
git checkout main
git pull
cd ../..
# now the parent repo sees the submodule directory has moved
git status
# modified: vendor/lib (new commits)
git add vendor/lib # adds the NEW SHA pointer
git commit -m "bump vendor/lib to <new-sha>"
git push
After your push, anyone else needs to:
git pull
git submodule update --init --recursive
If they forget the second command, their vendor/lib is still pointing at the old SHA. CI catches this; humans don't always.
Submodule pain points
The honest list:
-
Detached HEAD inside submodules. When git checks out the pinned SHA, you're in detached HEAD inside the submodule. Making changes there requires switching to a branch first — and people forget.
-
Shallow clone interactions. CI runs often shallow-clone (
--depth 1) for speed. Submodules with shallow parents fail because the pinned SHA isn't reachable in the shallow history. -
Permissions / auth duplication. You need credentials for the submodule's host as well as the parent's. SSH keys, deploy tokens, etc.
-
Two-step commits. Updating a dependency is now two commits across two repos, with the parent's commit only making sense after the child's is pushed.
-
Newcomer confusion. "Why is this directory empty?" is the most-asked question on every team using submodules. Forever.
These are not deal-breakers — many large projects (Chromium, LLVM, Linux kernel auxiliary modules) use submodules effectively. But they cost ongoing tax.
Subtrees
git subtree takes the opposite approach: bring the other repo's content (and optionally its history) into your repo, as files and commits.
# add vendor/lib by pulling in foo/lib's main branch
git remote add lib https://github.com/foo/lib.git
git subtree add --prefix=vendor/lib lib main --squash
# the --squash flag merges all of lib's history into ONE commit on your branch
After this:
- Your repo has the actual files at
vendor/lib. - Your history has either one squash commit (with
--squash) or all of lib's history (without). git cloneis normal — no--recurse-submodules, no missing files.
To update:
git subtree pull --prefix=vendor/lib lib main --squash
To contribute changes back upstream (rare but possible):
git subtree push --prefix=vendor/lib lib feature-branch
When subtree is the right answer
- Clone-and-go is critical. New engineers, CI bots, demo environments — anyone who runs
git cloneand expects everything to work. - The dependency is small. Adding a small library to your repo doesn't bloat noticeably; a 500MB vendor/ directory does.
- You'll rarely modify the dependency separately. If you mostly just want the code at a known version with no fuss, subtree fits.
When subtree is wrong
- The dependency changes constantly. Every update is a merge commit (or rebase) into your repo. Noisy.
- The dependency is huge. You don't want a 1GB submodule's full history in every clone of your project.
- Multiple projects share the same dependency. Subtree means each project has its own copy. Submodule (or a package manager) keeps one source of truth.
When neither is right — the third option
Most modern languages have a real package manager:
| Language | Tool | What it does |
|---|---|---|
| Go | go modules | Pulls dependency code into a local cache, pinned by version |
| Rust | Cargo | Same idea — Cargo.toml pins, Cargo.lock records |
| Node | npm/pnpm/yarn | package.json + package-lock.json |
| Python | uv/pip | pyproject.toml + uv.lock / requirements.txt |
| Ruby | Bundler | Gemfile.lock |
If your dependency has a package manifest in any of these systems, use the package manager. It handles versioning, transitive dependencies, security advisories, license tracking, and clone-and-go simplicity. Submodules and subtrees are for when the dependency is NOT packaged (a private fork, a not-yet-released library, a tool with no package).
For monorepos that contain unpublished packages, internal tooling like Bazel, Buck, or Pants is the right escalation — they handle internal package versioning at the build-tool level.
A decision matrix
| Situation | Best tool |
|---|---|
| Standard library/language with a package manager | Use the package manager |
| Internal library, frequent updates, multiple consumers | Bazel/Buck/Pants OR a private package registry |
| External repo, infrequent updates, one consumer | Subtree (with --squash) |
| External repo, frequent updates, you want explicit version pins | Submodule |
| Vendored fork of a public lib that you sometimes patch | Subtree |
| Independently-versioned third party (Chromium-style) | Submodule (with discipline) |
Diagnostic commands
Submodule status:
git submodule status
# +abc1234 vendor/lib (heads/main) ← + means out-of-sync with checked-out
# abc1234 vendor/lib (heads/main) ← in sync
# -abc1234 vendor/lib ← - means not initialized
CI sanity check — after clone, before build:
git submodule status | grep -E "^[+-]" && echo "Submodules out of sync" && exit 1
This catches the "forgot to update submodules" bug early.
A common pattern that works
Many production teams settle on:
- Application code — single repo, no submodules.
- Internal libraries — published to an internal package registry (npm, pypi, gems, go modules), pinned by version in the consumer.
- Vendored forks of external libs — subtree if small, submodule if large or independently versioned.
- Truly massive shared assets — git LFS, not submodule.
Stick to packages for code, subtree as the rare exception. Submodules only when nothing else fits.
What to internalise
- Submodule = SHA pointer to another repo's commit. Subtree = files merged into your repo.
--recurse-submodulesis non-negotiable when working with submodule-bearing repos.- Most dependency problems are better solved by a real package manager than by either submodules or subtrees.
- Subtree wins on clone-and-go; submodule wins on disk size and explicit version pinning.
- "We use submodules" should always be a discussed decision, never a default.
Tools in the wild
4 tools- cligit submodulefree tier
Built-in. The classic vendoring mechanism, footguns and all.
- cligit subtreefree tier
Built-in. The 'just merge it in' alternative.
- cliBazelfree tier
Build tool that handles vendored deps via http_archive — sidesteps git entirely.
- specgo modulesfree tier
Language-level dependency vendoring — what most modern languages do instead of submodules.