Backpressure & Load Shedding
How systems stay alive under stress.
Backpressure & Load Shedding
When a system is overloaded, the worst thing it can do is keep accepting work. The naive instinct — "queue more, retry harder" — makes overloads worse. This lesson is about the patterns that keep systems alive when they can't keep up.
Analogy
A restaurant kitchen during a Friday night dinner rush. Tickets are coming in faster than the line cook can plate. The bad strategies are obvious: yell louder at the cook, accept more tickets, double the menu. They all break the kitchen.
The good strategies are: stop accepting reservations (backpressure), drop the desserts and stick to mains (load shedding), pause incoming orders entirely until the queue drains (circuit breaker), or rate-limit each table to one course at a time (token bucket).
A well-run kitchen runs all four at different times. So does a well-run service.
Naive retries amplify outages
This is the most common mistake. Downstream is slow because it's overloaded. So your service retries. Now downstream is getting 2× the load it can't handle. So everything fails harder. Customers retry. So your service retries each retry. So downstream gets 4×. By minute three of an outage, you have a retry storm with 10× the original load and zero requests succeeding.
Rule: retries must back off exponentially, must have a hard cap on count, and must respect a circuit breaker. If the breaker is open, don't retry — fail fast.
Bounded queues
Every queue must have a size limit. Period. Unbounded queues turn transient overload into permanent OOM kills.
When the queue fills, you have four options for what to do with new arrivals:
- Drop newest — easiest. Returns 503 to new requests; existing queued work continues.
- Drop oldest — fairer (the people who waited longest probably gave up anyway).
- Drop random — sometimes used when fairness matters more than throughput.
- Refuse with a clear error — never put it in the queue.
There's no universal right answer. Pick based on what your users will do when they see the error.
Backpressure
Backpressure is when a slow consumer signals upstream to slow down. The signal flows the opposite direction of the data: data goes producer → consumer, but pressure goes consumer → producer.
In code, backpressure usually shows up as:
- Bounded async channels that block the producer when full.
- HTTP 429 responses that tell the client "wait and retry."
- Stream
pause()/resume()in node,backpressure_msin some queues.
The opposite — letting the producer fill an unbounded buffer — is back-overload, not backpressure. It's how systems die.
Load shedding
When you can't handle every request, drop low-priority ones to keep high-priority ones working. This requires that every request carry a priority — a tier, a class, a header.
Auth API calls beat analytics events. Logged-in user requests beat anonymous crawlers. Premium customer requests beat free-tier ones. The shedding rule is simple: drop bottom-X% by priority when overload is detected.
Without priorities, load shedding becomes "drop random," which is fair but not optimal. Most well-engineered systems have at least 3 priority tiers wired through.
Circuit breakers
Three states:
- Closed — traffic flows. Default state.
- Open — traffic blocked, fast-failed. Triggered by N consecutive failures.
- Half-open — one request allowed through to test recovery. Transition to closed on success, back to open on failure.
The half-open state is the magic. Without it, the breaker is either always blocking or always allowing — useless. With it, the breaker auto-recovers when the downstream comes back.
Hystrix-style libraries (Java) and resilience4j (modern) implement this; in Node, opossum is the canonical choice.
Token bucket vs leaky bucket
Two rate-limiting models with different traffic shapes:
Token bucket — bucket holds N tokens, refills at R tokens per second. Each request takes a token. Allows bursts up to bucket size, then throttles to R/sec.
Leaky bucket — requests fill the bucket; bucket "leaks" at R/sec. Smooths bursts to R/sec sustained, drops anything that overflows.
Pick token bucket when you want to allow legitimate bursts (e.g., a UI client that needs 10 fast requests then idles). Pick leaky bucket when you must protect downstream from any burst.
Why long timeouts are a load-shedding signal
Timeouts that are "too generous" are a hidden form of load amplification. Setting a 30-second timeout means each slow downstream request occupies a slot in your service for 30 seconds. With 100 concurrent slots and a slow downstream, you can handle 100/30 = 3.3 RPS before saturating.
Tight timeouts (1-2 seconds for typical API calls) shed load implicitly. Better to fail fast and retry than to wait while your queue fills.
In the playground
The Cascade gives you a 3-service chain (frontend → middle → database). Configure resilience settings on each. Three traffic events hit: gradual ramp, sudden spike, downstream outage. Win condition: frontend success rate ≥ 90% across all three.
Tools in the wild
6 tools- libraryEnvoyfree tier
Service-mesh proxy with first-class circuit breakers, outlier detection, and rate limits.
- libraryResilience4jfree tier
Modern JVM library: circuit breakers, bulkheads, retries, rate limiters — Hystrix's successor.
- libraryopossumfree tier
Battle-tested Node.js circuit breaker (RedHat) with metrics + fallback hooks.
- libraryPollyfree tier
.NET resilience library — retry, circuit breaker, bulkhead, timeout, fallback policies.
- librarySentinelfree tier
Alibaba's flow control + circuit-breaking library, battle-tested at extreme scale.
- libraryLinkerdfree tier
Lightweight service mesh with automatic retries, timeouts, and load balancing.