JIT vs AOT
Tiered compilation, inline caches, deoptimization — and the cold-start vs throughput trade.
JIT vs AOT
Two strategies for turning source into machine code, two different shapes of performance curve. AOT is fast from the first instruction and never gets faster. JIT starts slow, watches the program run, and eventually beats AOT on code where runtime profile data lets it inline more aggressively. The right answer depends on whether your program runs for 50 milliseconds or 50 hours.
Analogy
Picture two pizza shops. The AOT shop bakes every pizza on the menu in advance and keeps them in a warming drawer. Walk in, point, leave with hot pizza in seconds — no surprises, identical pizza every time. The JIT shop bakes to order. The first customer waits twelve minutes. By customer 50, the oven knows exactly how long the dough takes, the toppings are pre-portioned, the cheese is at the right temperature, and orders come out in 90 seconds — better than the warming-drawer pizza. The catch: if you only ever serve five customers a day, the JIT shop never breaks even, and you'd rather just have a warming drawer.
What AOT does
Ahead-of-time compilation runs once, at build time. The compiler has the whole program (or a translation unit), full type information, and unlimited time relative to runtime budgets. It emits native machine code straight to disk:
gcc -O3 hello.c -o hello # ~1 sec build, ~zero runtime overhead
go build . # same
rustc -C opt-level=3 main.rs # same
What you gain:
- Zero startup cost. The OS loader maps the binary into memory; your
main()is the first user code that runs. - Predictable peak throughput. Performance does not improve over time; whatever the binary did in the first second is what it does forever.
- Smaller runtime footprint. No bytecode interpreter, no JIT compiler in the process — sometimes a 5–20 MB win.
What you lose:
- No profile-guided optimisation by default. You can do
gcc -fprofile-useto feed in PGO data, but it's a separate workflow. - Dynamic features. AOT struggles with
eval, late binding, runtime monkey-patching. Languages that lean on these ship JITs, not AOTs.
What JIT does
A just-in-time compiler runs inside your program. It interprets bytecode on the cold path and re-compiles the hot path to native machine code using actual runtime data — argument types, branch frequencies, hot loop bodies.
source ─→ bytecode ─→ (interpret + profile) ─→ JIT compile ─→ native ─→ run
│
└── if assumptions break: deoptimize, back to bytecode
What you gain:
- Peak throughput rivals AOT and sometimes beats it on dynamic code where profile data unlocks inlining the AOT compiler couldn't prove safe.
- Adaptive specialization. A function called with
(number, number)99% of the time gets compiled assuming numbers; the rare string call deoptimises and falls back. - No build step. Source loads, compiles to bytecode, runs.
What you lose:
- Warm-up time. The first N invocations run on the interpreter. For a 50 ms Lambda invocation, this might be 100% of the work.
- Runtime overhead. The JIT compiler itself is RAM and CPU. V8 and HotSpot both ship JITs in the tens of MB.
- Less determinism. Two identical processes with the same input may differ slightly in performance because they tier-up at different times.
Tiered compilation
Modern JITs (V8, HotSpot, .NET CoreCLR) don't have one compiler — they have tiers. A typical layout:
| Tier | What it does | When it runs |
|---|---|---|
| Interpreter / Ignition | Walks bytecode | First few invocations |
| Baseline / Sparkplug | Quick compile, no inlining | After a few hundred invocations |
| Optimising / TurboFan / C2 | Heavy inlining, profile-guided | After thousands |
Why tiers exist: the interpreter is small and starts instantly, perfect for cold code. The optimising compiler is huge and slow but produces the fastest output, perfect for the 5% of code that runs 95% of the time. Tiering gives you both at once.
You can watch V8's tier-up:
node --allow-natives-syntax --trace-opt --trace-deopt script.js
Lines like [marking 0x... <JSFunction hot> for optimized recompilation] are TurboFan promoting a function. Lines like [deoptimizing 0x...] are it bailing back to bytecode.
Inline caches
When the JIT compiles obj.x, it doesn't compile a generic property lookup — that would be slow. It compiles a specialised access for the shape of object actually seen at this call site. The mechanism is the inline cache (IC).
function getName(o) { return o.name; } // IC at the dot is empty
getName({ name: "ada" }); // IC records: shape A → offset 0
getName({ name: "lin" }); // same shape, IC hits
getName({ name: "x", age: 30 }); // different shape → IC widens to PIC (polymorphic)
Stages:
- Monomorphic — one shape seen. Fastest, single direct load.
- Polymorphic — 2–4 shapes. Small list of
(shape, offset). - Megamorphic — 5+ shapes. Falls back to a generic dictionary lookup; ~10× slower.
Designing types that keep ICs monomorphic is a real performance lever in V8. It's also why TypeScript and Java code outperforms equivalent untyped code in JITs — typed code naturally produces stable shapes.
Deoptimization
A JIT optimises by assuming. "I've only ever seen numbers here, so I'll inline numeric ops." When the assumption breaks, the JIT must:
- Stop the function mid-execution (at a safe point).
- Reconstruct the equivalent interpreter state.
- Resume in the interpreter.
- Mark the call site as polymorphic and recompile when warm again.
Deopts are expensive but rare. Visible deopts in profiles usually mean a hot function is being called with mixed types — a refactor target.
Cold start vs peak throughput
This is the headline trade-off:
| Workload shape | Right tool |
|---|---|
| Lambda invocation, 50 ms, exits | AOT (Go, Rust, .NET native AOT, GraalVM native-image) |
| Long-lived web service, 12-hour pod uptime | JIT (V8, HotSpot, .NET) |
| CLI tool, runs once, exits | AOT |
| Build server, runs all day | JIT or AOT either work |
| Mobile app, milliseconds matter on launch | AOT (iOS) or AOT-compiled-DEX (Android) |
The JVM ecosystem flipped this on its head with GraalVM native-image — AOT-compiling Java code for fast cold-start serverless. Quarkus, Micronaut, and Spring Native all use it.
When AOT and JIT coexist
C# / .NET ships ReadyToRun (R2R) — pre-compiled native code shipped alongside the IL, used as a starting point that the JIT can replace if it has better profile data. Best of both worlds: fast cold start (AOT pre-compiled) plus profile-guided peak (JIT).
Modern JITs are starting to share AOT artifacts across processes too — V8's "ondemand snapshots" and HotSpot's CDS / AppCDS save start-up state to disk so the next process loads it instead of regenerating.
How to tell which you're using
- AOT: there's a build step that produces a binary;
file ./mythingsays ELF / Mach-O / PE. - Pure interpreter: no native code emission ever. CPython, Ruby MRI, classic Perl.
- JIT: a runtime + bytecode + tier-up traces. V8, HotSpot, PyPy, .NET CLR, LuaJIT.
You can verify with the toolchain — --trace-opt, -XX:+PrintCompilation, pypy --jit off — but you should also know your stack from architecture, not from telemetry.
Practical decisions
- For serverless: AOT or pre-warmed JIT (provisioned concurrency, snapshotting).
- For web services: JIT, no special configuration needed.
- For CLI tools and DevOps glue: AOT every time. A 200 ms
tscstartup is a lot worse than a 30 msgo build. - For mobile: respect platform guidance — Android Runtime (ART) and Apple's Bitcode pipeline both AOT-compile.