apis · level 6

Error Shapes

Problem+JSON, validation envelopes, and retryable errors.

175 XP

Error Shapes

Errors are part of your API's surface area. They get parsed, logged, displayed in dashboards, alerted on, and triaged at 3am. Treat them with the same care as your success responses — pick one shape, document it, use it everywhere.

RFC 7807 / RFC 9457 — problem+json

The IETF settled this question in 2016. RFC 7807 (and its 2023 successor RFC 9457) defines application/problem+json — a tiny, structured error format that carries enough context for both humans and machines.

HTTP/1.1 422 Unprocessable Entity
Content-Type: application/problem+json

{
  "type": "https://api.example.com/errors/validation-failed",
  "title": "Validation failed",
  "status": 422,
  "detail": "One or more fields failed validation.",
  "instance": "/users/sign-up?req=8a6...",
  "errors": [
    { "field": "email", "rule": "format" },
    { "field": "age", "rule": "min" }
  ]
}

Five canonical fields:

Field Purpose
type URI identifying the problem class. Treat as a primary key clients can branch on.
title Short, human-readable summary. Static for the type.
status The HTTP status code, duplicated in the body for log convenience.
detail Human-readable, request-specific. Safe for end users? Maybe.
instance URI identifying this specific occurrence — often a request ID.

The format is open: add as many extension fields as you need (errors, request_id, trace_id, etc).

The 4xx vs 5xx distinction

The single most important error rule: 4xx means the caller did something wrong. 5xx means the server did.

Code When
400 Body is malformed — invalid JSON, missing required fields, wrong types.
401 No (or invalid) credentials. The client should reauth.
403 Authenticated, but not authorised. The client cannot retry by reauthing.
404 Resource does not exist (or the caller cannot see that it exists).
409 Conflict — concurrent edits, unique-constraint violation, idempotency-key reuse.
410 Gone — the resource existed and is permanently removed.
422 Body parses fine, but fails semantic validation. The default for form errors.
429 Rate limit. Pair with Retry-After.
500 Unhandled exception. Should never appear in healthy production.
502 Upstream service returned garbage.
503 Service is temporarily unavailable. Pair with Retry-After.
504 Upstream timed out.

400 vs 422 trips people up. The rule of thumb: 400 if you couldn't even parse it, 422 if you parsed it and it failed your rules. A missing closing brace is 400. An email that is not a valid email is 422.

Per-field validation errors

The hardest case is the form with seven things wrong with it. The wrong shape:

{ "error": "email is invalid" }

The right shape — return all errors at once, structured:

{
  "type": "https://api.example.com/errors/validation-failed",
  "title": "Validation failed",
  "status": 422,
  "errors": [
    { "field": "email",      "rule": "format",    "message": "Must be a valid email." },
    { "field": "age",        "rule": "min",       "message": "Must be at least 18.", "min": 18 },
    { "field": "password",   "rule": "minLength", "message": "At least 8 characters.", "minLength": 8 }
  ]
}

Three properties matter:

  1. field — points at the JSON path or form name the client should highlight.
  2. rule — machine-readable, lets the client localise messages or branch on rule type.
  3. message — fall-back human text.

Send all of them. Returning one error at a time forces the user to play whack-a-mole.

Transient vs permanent

Some 5xx errors are worth retrying. Some are not. The signal is Retry-After:

HTTP/1.1 503 Service Unavailable
Retry-After: 30
Content-Type: application/problem+json

{
  "type": "https://api.example.com/errors/service-unavailable",
  "title": "Service temporarily unavailable",
  "status": 503,
  "detail": "Database failover in progress."
}

Retry-After accepts either an integer (seconds) or an HTTP-date. Clients with retry middleware (Stripe SDK, AWS SDK, axios-retry) honour it automatically.

500 without Retry-After says "I don't know if retrying will help, probably not". 503 with Retry-After: 30 says "yes, retry in 30 seconds, the same request will likely succeed".

What error responses should NOT include

  • Stack traces. Never. They leak file paths, package versions, sometimes secrets in arg values. Log internally, send a request ID externally so you can correlate.
  • Database error messages. "duplicate key value violates unique constraint users_email_key" tells an attacker your schema. Translate to "email already in use" or 409 Conflict.
  • Internal IDs of other users. "user with id 3471 already has this email" is a user enumeration vulnerability.

The right shape:

{
  "type": "https://api.example.com/errors/internal-error",
  "title": "Internal Server Error",
  "status": 500,
  "request_id": "req_8a6f3c2b9d4e1f50"
}

The user gets a request ID. Your support team can pull the full stack trace from logs.

One envelope, everywhere

Pick the envelope on day one, document it, lint for it. Mixing problem+json with {error: "..."} with {message: ..., code: ...} is the worst of all worlds — clients have to write three parsers.

If you're starting fresh, use RFC 9457. If you have a legacy {error, code} shape, document it and don't add a new one. Consistency beats correctness.

When to add error codes

A code field is helpful when the URI in type is too noisy:

{
  "type": "https://api.example.com/errors/payment-failed",
  "code": "card_declined",
  "title": "Payment failed",
  "status": 402,
  "decline_code": "insufficient_funds"
}

The client branches on code (a stable string) without parsing the URI. Both fields point at the same concept; pick whichever your tooling reads more easily.

Summary

  • One envelope shape — RFC 9457 unless you have a strong reason otherwise.
  • 4xx for caller bugs, 5xx for your bugs.
  • Validation errors return ALL field errors at once, with field + rule + message.
  • 5xx + Retry-After for transient failures.
  • Never leak stack traces. Always include a request ID.