sre · level 10

Backpressure & Load Shedding

How systems stay alive under stress.

250 XP

Backpressure & Load Shedding

When a system is overloaded, the worst thing it can do is keep accepting work. The naive instinct — "queue more, retry harder" — makes overloads worse. This lesson is about the patterns that keep systems alive when they can't keep up.

Analogy

A restaurant kitchen during a Friday night dinner rush. Tickets are coming in faster than the line cook can plate. The bad strategies are obvious: yell louder at the cook, accept more tickets, double the menu. They all break the kitchen.

The good strategies are: stop accepting reservations (backpressure), drop the desserts and stick to mains (load shedding), pause incoming orders entirely until the queue drains (circuit breaker), or rate-limit each table to one course at a time (token bucket).

A well-run kitchen runs all four at different times. So does a well-run service.

Naive retries amplify outages

This is the most common mistake. Downstream is slow because it's overloaded. So your service retries. Now downstream is getting 2× the load it can't handle. So everything fails harder. Customers retry. So your service retries each retry. So downstream gets 4×. By minute three of an outage, you have a retry storm with 10× the original load and zero requests succeeding.

Rule: retries must back off exponentially, must have a hard cap on count, and must respect a circuit breaker. If the breaker is open, don't retry — fail fast.

Bounded queues

Every queue must have a size limit. Period. Unbounded queues turn transient overload into permanent OOM kills.

When the queue fills, you have four options for what to do with new arrivals:

  • Drop newest — easiest. Returns 503 to new requests; existing queued work continues.
  • Drop oldest — fairer (the people who waited longest probably gave up anyway).
  • Drop random — sometimes used when fairness matters more than throughput.
  • Refuse with a clear error — never put it in the queue.

There's no universal right answer. Pick based on what your users will do when they see the error.

Backpressure

Backpressure is when a slow consumer signals upstream to slow down. The signal flows the opposite direction of the data: data goes producer → consumer, but pressure goes consumer → producer.

In code, backpressure usually shows up as:

  • Bounded async channels that block the producer when full.
  • HTTP 429 responses that tell the client "wait and retry."
  • Stream pause()/resume() in node, backpressure_ms in some queues.

The opposite — letting the producer fill an unbounded buffer — is back-overload, not backpressure. It's how systems die.

Load shedding

When you can't handle every request, drop low-priority ones to keep high-priority ones working. This requires that every request carry a priority — a tier, a class, a header.

Auth API calls beat analytics events. Logged-in user requests beat anonymous crawlers. Premium customer requests beat free-tier ones. The shedding rule is simple: drop bottom-X% by priority when overload is detected.

Without priorities, load shedding becomes "drop random," which is fair but not optimal. Most well-engineered systems have at least 3 priority tiers wired through.

Circuit breakers

Three states:

  • Closed — traffic flows. Default state.
  • Open — traffic blocked, fast-failed. Triggered by N consecutive failures.
  • Half-open — one request allowed through to test recovery. Transition to closed on success, back to open on failure.

The half-open state is the magic. Without it, the breaker is either always blocking or always allowing — useless. With it, the breaker auto-recovers when the downstream comes back.

Hystrix-style libraries (Java) and resilience4j (modern) implement this; in Node, opossum is the canonical choice.

Token bucket vs leaky bucket

Two rate-limiting models with different traffic shapes:

Token bucket — bucket holds N tokens, refills at R tokens per second. Each request takes a token. Allows bursts up to bucket size, then throttles to R/sec.

Leaky bucket — requests fill the bucket; bucket "leaks" at R/sec. Smooths bursts to R/sec sustained, drops anything that overflows.

Pick token bucket when you want to allow legitimate bursts (e.g., a UI client that needs 10 fast requests then idles). Pick leaky bucket when you must protect downstream from any burst.

Why long timeouts are a load-shedding signal

Timeouts that are "too generous" are a hidden form of load amplification. Setting a 30-second timeout means each slow downstream request occupies a slot in your service for 30 seconds. With 100 concurrent slots and a slow downstream, you can handle 100/30 = 3.3 RPS before saturating.

Tight timeouts (1-2 seconds for typical API calls) shed load implicitly. Better to fail fast and retry than to wait while your queue fills.

In the playground

The Cascade gives you a 3-service chain (frontend → middle → database). Configure resilience settings on each. Three traffic events hit: gradual ramp, sudden spike, downstream outage. Win condition: frontend success rate ≥ 90% across all three.

Tools in the wild

6 tools
  • Envoyfree tier

    Service-mesh proxy with first-class circuit breakers, outlier detection, and rate limits.

    library
  • Resilience4jfree tier

    Modern JVM library: circuit breakers, bulkheads, retries, rate limiters — Hystrix's successor.

    library
  • opossumfree tier

    Battle-tested Node.js circuit breaker (RedHat) with metrics + fallback hooks.

    library
  • Pollyfree tier

    .NET resilience library — retry, circuit breaker, bulkhead, timeout, fallback policies.

    library
  • Sentinelfree tier

    Alibaba's flow control + circuit-breaking library, battle-tested at extreme scale.

    library
  • Linkerdfree tier

    Lightweight service mesh with automatic retries, timeouts, and load balancing.

    library