Webhooks
Outbound HTTP, signing, retries, and dead-letter handling.
Webhooks
A webhook is a regular HTTP request — except your service is the one making it, and the destination is a URL the customer gave you. Inversion of control. Five things to get right: delivery, signing, replay protection, retries, and dead-letter handling.
The flow
Your service Customer's endpoint
────────────── ───────────────────
[event happens]
│
│ POST https://customer.example.com/webhooks
│ Content-Type: application/json
│ Webhook-Id: evt_8a6f3c2b9d4e1f50
│ Webhook-Timestamp: 1714323840
│ Webhook-Signature: v1,7a3...
│
│ { "type": "payment.succeeded", ... }
▶───────────────────────────────────────────────▶
[200 OK]
◀───────────────────────────────────────────────◀
│
[mark delivered] │
Three headers carry the security envelope: an event ID, a timestamp, and a signature. The body is the event itself.
Signing — HMAC over (timestamp.body)
Webhooks must be signed. Without a signature, anyone who learns the URL can post fake events.
The standard pattern (Stripe, GitHub, Standard Webhooks) is HMAC-SHA256 over a string that includes the timestamp:
signed_payload = timestamp + "." + raw_body
signature = hex( HMAC-SHA256( secret, signed_payload ) )
The receiver re-computes the same and compares with crypto.timingSafeEqual (constant-time, to prevent leak via timing attack):
import { createHmac, timingSafeEqual } from "node:crypto";
const expected = createHmac("sha256", secret)
.update(`${ts}.${rawBody}`)
.digest("hex");
if (!timingSafeEqual(Buffer.from(sig, "hex"), Buffer.from(expected, "hex"))) {
return res.status(400).json({ error: "bad_signature" });
}
Two non-obvious gotchas:
- Verify the RAW body, not parsed JSON. Once you JSON.parse, key order and whitespace are lost. A re-serialized body will not match. Most frameworks let you grab the raw body before parsing.
- Timing-safe compare.
==leaks one byte per round-trip via timing attack. Usecrypto.timingSafeEqual(Node) orhmac.compare_digest(Python).
Replay protection — timestamps with tolerance
A captured signed payload is replayable forever unless you bound it in time. Senders include a Unix timestamp; receivers refuse anything outside (say) 5 minutes:
const tolerance = 300; // 5 minutes
if (Math.abs(Date.now() / 1000 - ts) > tolerance) {
return res.status(400).json({ error: "timestamp_out_of_tolerance" });
}
The tolerance is a trade-off: too low and clock skew or slow networks break legitimate deliveries; too high and replay windows are wide. 5 minutes is the convention.
Retries with backoff
The receiver can be flaky. Networks, deploys, transient bugs. The sender retries — but bluntly. Retries that hammer a struggling endpoint at fixed intervals make outages worse.
The pattern is exponential backoff with jitter:
attempt 1: immediate
attempt 2: ~10s + jitter
attempt 3: ~1m + jitter
attempt 4: ~10m + jitter
attempt 5: ~1h + jitter
attempt 6: ~6h + jitter
attempt 7: ~24h + jitter
Stripe retries for up to 3 days before giving up. After that, the event lands in a dead-letter location.
What status codes trigger retry?
| Code | Retry? | Reason |
|---|---|---|
| 2xx | No | Success — drop the schedule. |
| 3xx | No | Redirect — follow up to a limit, then treat as 2xx or fail. |
| 4xx | No (except 408, 429) | Caller mistake — retry won't help. |
| 408 | Yes | Request timeout. |
| 429 | Yes | Rate-limited — honour Retry-After if present. |
| 5xx | Yes | Server error — almost always transient. |
| timeout/network | Yes | Connection failed before any response. |
The 4xx-no-retry rule keeps you from spamming a misconfigured receiver forever.
Dead-letter handling
Eventually retries run out. Where does the event go?
The right answer: a dead-letter store the customer can inspect and replay from. SaaS webhook providers (Svix, Hookdeck) build a UI for this. Self-hosted: a Postgres table or an SQS DLQ works fine. Fields:
event_id pk
event_type
payload jsonb
last_attempt_at timestamptz
last_error_status int
attempt_count int
deadlettered_at timestamptz
The customer needs three things: a list view, a per-event detail page (request, response, error), and a "retry" button that re-enqueues with a fresh schedule.
The "retry" button is the difference between an integration that works and one that loses data.
The receiver's side — idempotency
Your receiver will see duplicates. Two reasons:
- The sender's retry won the race against the receiver's ACK.
- A "retry" button on the dashboard re-fired an already-processed event.
Solve once, at the front door:
CREATE TABLE webhook_events_seen (
event_id text PRIMARY KEY,
received_at timestamptz NOT NULL DEFAULT now()
);
async function handle(event) {
try {
await db.query(
"INSERT INTO webhook_events_seen (event_id) VALUES ($1)",
[event.id]
);
} catch (e) {
if (isUniqueViolation(e)) return; // already processed; ack and ignore
throw e;
}
await processEvent(event);
}
The unique constraint on event_id means a duplicate INSERT throws — you swallow the duplicate and ACK. The processing only runs once.
Webhook receiver fatigue
Customers integrating with your webhooks suffer from "webhook fatigue":
- 14 different signature schemes if you've integrated 14 vendors.
- Surprise event types that don't appear in the docs.
- Surprise schema migrations: a field becomes optional, code that destructures it crashes.
- Replays that re-fire ancient events when a customer enables a new integration.
Mitigations:
- Use Standard Webhooks — a community spec for the headers, signing scheme, and retry policy. One verifier, many vendors.
- Document every event type up front — what triggers it, what fields it contains, examples.
- Version events —
payment.succeeded.v2is OK; silently changingpayment.succeededis not. - Make replays explicit — never silently re-fire historical events on integration setup.
Summary
Webhooks are five problems wearing a trench coat:
- Sign with HMAC over (timestamp.body) — verify the raw body in constant time.
- Bound replay windows — reject timestamps outside ±5 minutes.
- Retry 5xx/timeout with exponential backoff + jitter — don't retry 4xx.
- Dead-letter after the budget — give the customer a list + replay button.
- Receiver dedups by event ID — process exactly once even with duplicates.
Build all five from day one; retrofitting any one of them is painful.
Tools in the wild
4 tools- specStandard Webhooksfree tier
An OSS spec + reference SDKs that consolidate the Stripe/GitHub/Slack webhook conventions.
- service
Webhooks-as-a-service — signing, retries, DLQ, replay UI built in.
- service
Webhook event gateway with queueing, retries, and a verification dashboard.
- clingrokfree tier
Tunnel a local dev port to the public web so a webhook can reach localhost.