runtimes · level 9

Runtime Introspection

Profilers, debuggers, and the production-debugging toolkit you don't break with.

220 XP

Runtime Introspection

A live program is a black box until you attach a tool that can see inside it. Profilers, debuggers, tracers, and stack-sampling utilities are how you turn "the box is hot" into "this specific function takes 70% of the time." Done right, you can do it in production without restarting the process. Done wrong, you cause an outage trying to investigate one.

Analogy

Picture a hospital triage. You don't pop the patient open to find what hurts. You start with non-invasive instruments — pulse, temperature, blood pressure, X-ray — that read the patient at rest. Only when those don't answer the question do you escalate to invasive tools. Profilers and metrics are stethoscopes. Tracing tools are X-rays. A debugger that pauses the process is exploratory surgery — last resort, never in production unless you really know what you're doing.

Sampling vs tracing profilers

Two fundamentally different shapes.

Sampling. Every N microseconds, the profiler interrupts the program briefly and records the current stack. After N seconds you have N×(samples-per-sec) samples. The hottest function is whichever appears in the most samples. Overhead: typically 1–5%. Used in production safely.

perf record -F 99 -p 12345 -g -- sleep 30   # 99 Hz, 30 seconds
perf report

Tracing. Every function entry and exit emits an event. You get exact call counts, exact durations, and the full call graph. Overhead: 30–500% — too much for production unless you're targeting one process for one minute.

node --prof index.js   # V8 sampling
node --trace-events-enabled index.js   # tracing

For 95% of "this is slow" investigations, sampling is the right choice. Save tracing for when you must answer "exactly how many times was X called and in what order?"

Tools by ecosystem

Runtime	Sampling profiler	Trace / debugger	When
Linux native (Go, Rust, C)	`perf record`, `eBPF`, `bpftrace`	`gdb`, `lldb`, `strace`, `bpftrace`	Always available
Java / JVM	`async-profiler`	`JFR`, `jdb`, `arthas`	Production-safe
Python	`py-spy`, `scalene`, `pyinstrument`	`cProfile`, `tracemalloc`, `pdb`	py-spy attaches without restart
Node.js	`node --prof`, `0x`, `clinic.js`	`node --inspect` + DevTools	Built-in
.NET	`dotnet-trace`, `PerfView`	`dotnet-dump`, Visual Studio	Microsoft-blessed
Browser	DevTools Performance tab	DevTools Sources / Debugger	Always

The pattern across stacks: a sampling profiler plus a way to view its output as a flame graph is what answers most questions.

Reading a flame graph

Flame graphs are the de-facto visualization for stack samples. Read them like this:

x-axis is sample count, sorted alphabetically (no time order). Width = total time spent in that frame.
y-axis is stack depth. The frame at the bottom is the entry point; frames stack upward.
A wide bar at the top is a leaf function eating CPU. That's your target.
A wide bar that has many narrow children means the function itself is slow, not its callees.
A narrow tower that's tall is a deep call chain that runs occasionally — usually not the bottleneck.

Brendan Gregg's FlameGraph.pl is the canonical generator. Most modern profilers (async-profiler, py-spy, clinic) emit the SVG directly.

Production-safe attaching

Three patterns to remember:

Sample over a fixed window. Run the profiler for 30 seconds, capture the data, exit. Don't leave it attached.

py-spy record -o profile.svg -d 30 --pid 12345
async-profiler -d 30 -o flamegraph.html 12345

Use eBPF for kernel-level safety. bpftrace runs scripts in the kernel's verified sandbox; can profile anything (syscalls, scheduler events, IO latency) without modifying the target process.

sudo bpftrace -e 'profile:hz:99 /pid == 12345/ { @[ustack] = count(); }'

Export-then-analyse. Capture the raw data on the production box, transfer the file, analyze on a workstation. Avoids interactive debugging on prod.

Stack traces and source maps

When an exception reaches your error tracker, the stack trace is what you have. For native binaries (Go, Rust), the trace already names your functions — DWARF debug info maps PC → file:line. For minified JS bundles, the trace is bundle.min.js:1:42718 — useless without the source map (.js.map) to translate back to auth.ts:42.

Three rules for source maps in production:

Generate them on every build.
Upload them to your error tracker (Sentry, Datadog, Bugsnag) so backtraces unminify automatically.
Don't serve them publicly — source maps reveal your unminified code.

Distributed traces

When the bottleneck is across services, single-process profilers don't help. Distributed tracing fills the gap. The unit is a span: a name, start, end, parent ID, and metadata. Spans form a tree per request, propagated via headers (traceparent from W3C Trace Context).

import { trace } from "@opentelemetry/api";
const tracer = trace.getTracer("checkout");

await tracer.startActiveSpan("checkout.process", async (span) => {
  span.setAttribute("user_id", user.id);
  try {
    await chargeCard(user, amount);   // child span (auto-instrumented)
    span.setStatus({ code: 1 });      // OK
  } catch (e) {
    span.recordException(e);
    span.setStatus({ code: 2 });
    throw e;
  } finally {
    span.end();
  }
});

Tools that consume the spans: Jaeger, Tempo, Honeycomb, Lightstep, Datadog APM, AWS X-Ray. They all show the same shape — a flamegraph-like view across services, with the long span being the bottleneck.

Debuggers in production

Mostly: don't. Pausing a production process holds queues, drops connections, may trigger health-check failures, and one stuck breakpoint takes the service down. Use stack samples and structured logs. The exception is eBPF and pprof-style live introspection that doesn't pause the process.

When you must debug a real production bug:

Reproduce locally first if at all possible.
If not, attach to a single canary instance, draining traffic from it first.
Set conditional breakpoints, not unconditional ones, to limit pausing impact.
Detach the moment you have what you need.

Memory profilers

Different category, same idea. CPU is "what's running"; memory is "what's allocated."

Java: jmap, MAT, async-profiler's alloc mode, JFR allocation samples.
Python: tracemalloc, memray, objgraph. memray is the modern best-in-class.
Node: heap snapshot via --inspect + DevTools Memory tab.
Go: pprof heap profile (/debug/pprof/heap).

The pattern: snapshot the heap, find the allocation site of the object type that grew, fix the leak.

Common bugs

Profiler attached and forgotten. A profiler running for hours adds steady overhead and produces gigabytes of data. Always cap with -d 30 or --duration.

Tracing in production. A 200% overhead profiler on a service running at 60% CPU sends it to 100% and beyond. Sample, don't trace.

Reading flame graphs as time order. They're not — x-axis is alphabetical. Width is total time, not "what happened first."

Source maps in public. Serving .js.map publicly means your unminified code is one curl away. Upload to your error tracker, don't ship them.

Spans without context propagation. A trace that stops at a service boundary because the upstream didn't forward the traceparent header. Worse than no tracing — you think you're tracing and you're not.

Practical checklist

For "what's slow on this box": sampling profiler, flame graph, read top-down for hot leaves.
For "what's slow across the system": distributed tracing, look for the long span.
For "what's leaking memory": heap snapshot, group by type, find the allocation site.
For "what just crashed": stack trace + source map. If you can't read the trace, fix the source-map upload first.
For "what's the program doing right now": py-spy top or top -H -p PID or DevTools' live stack.
For all of the above: prefer non-invasive (sampling, eBPF) over invasive (debugger, tracing) by default.