API Error Handling: HTTP Status Codes, Error Bodies, and Retry Logic
Errors are not edge cases in API development. They are a primary output. Every API call that can fail will fail — due to invalid input, authentication problems, resource conflicts, rate limits, or infrastructure issues — and how an API communicates those failures determines whether integrators can handle them gracefully or are left guessing. An API that returns clear, consistent, actionable errors is a well-designed API. Everything else is guesswork at scale.
HTTP Status Codes: Use Them Correctly
HTTP has a rich vocabulary of status codes. Most developers know 200 (OK) and 404 (Not Found). Effective API design requires fluency with the full relevant set, because the status code is the first signal a client uses to categorize a response and decide how to handle it.
2xx: Success. 200 (OK) for a successful request with a body. 201 (Created) for a successful POST that created a resource — include a Location header pointing to the new resource. 204 (No Content) for a successful DELETE or update that returns nothing. Do not return 200 with an error payload; that is one of the most common API design mistakes and makes error handling in clients unnecessarily complex.
4xx: Client error. The caller did something wrong. 400 (Bad Request) for malformed input — missing required fields, invalid values, schema violations. 401 (Unauthorized) for missing or invalid authentication. 403 (Forbidden) for valid authentication but insufficient permissions — the caller is known but not allowed. 404 (Not Found) for resources that do not exist. 409 (Conflict) for state conflicts — trying to create something that already exists, or a concurrent modification conflict. 422 (Unprocessable Entity) for input that is syntactically valid but semantically wrong — valid JSON, but the values make no sense in context. 429 (Too Many Requests) for rate limit violations.
5xx: Server error. The server did something wrong. 500 (Internal Server Error) for unexpected failures. 502 (Bad Gateway) for failures in upstream dependencies. 503 (Service Unavailable) for intentional downtime or overload. 504 (Gateway Timeout) for upstream dependency timeouts.
The critical distinction is 4xx vs 5xx. A 4xx error tells the client their request is the problem; retrying the same request will not help. A 5xx error tells the client the server is the problem; retrying may succeed. Clients that understand this distinction can implement appropriate retry behavior. Clients receiving 500 responses to requests that should have been 400 will retry invalid requests indefinitely.
Error Response Bodies: Make Them Useful
The status code is classification. The response body is diagnosis. Both matter.
A minimal but useful error body includes a machine-readable error code, a human-readable message, and any context that helps the developer understand or fix the problem:
{
"error": {
"code": "validation_error",
"message": "The request body contains invalid fields.",
"details": [
{
"field": "email",
"issue": "Must be a valid email address.",
"value": "not-an-email"
},
{
"field": "amount",
"issue": "Must be a positive integer.",
"value": -50
}
]
}
}
The code field is for code. A developer’s client can switch on validation_error vs insufficient_funds vs duplicate_record and handle each case appropriately. The message field is for humans — a developer reading logs or debugging interactively. The details array breaks down field-level validation failures rather than returning a single vague message, which means the developer does not need to fix one field, re-submit, find the next error, repeat.
Do not return different error schemas for different endpoints. Inconsistency in error format forces integrators to write different error handling for different parts of your API. Establish a standard error shape and use it everywhere, including edge cases and unexpected failures.
Include a request ID (correlation ID) in both successful and error responses. When a developer contacts support about a failed request, the request ID is what connects their report to your logs. Without it, debugging production issues becomes disproportionately difficult for everyone involved.
Validation Errors: All of Them at Once
Return all validation errors in a single response, not one at a time. If a request has five invalid fields, return all five errors in a single 400 response, not one error per request. Returning errors one at a time forces the integrator into a cycle of fix-resubmit-find-next-error that is irritating during development and unacceptable in automated workflows.
Retry Logic: Client-Side Responsibility
Once a client knows which errors are retryable (5xx, 429, network timeouts) and which are not (4xx), it needs a sensible retry strategy.
Immediate retry is almost always wrong. If your API returns a 503 because it is overloaded, clients that immediately retry all add to the overload. The correct behavior is exponential backoff: wait a short initial period (1 second, 2 seconds), double it on each subsequent retry (2s, 4s, 8s, 16s), add jitter (a random offset to desynchronize multiple clients retrying simultaneously), and cap at a maximum wait time and a maximum number of attempts.
For 429 responses, use the Retry-After header if provided. The API is explicitly telling the client when to try again; there is no reason to guess with backoff.
Idempotency matters for retries involving write operations. If a POST request times out before you receive a response, you do not know whether the server processed it. Retrying a non-idempotent POST might create a duplicate record. Well-designed APIs provide idempotency keys — a client-generated token included in the request header that the server uses to deduplicate. If a request with the same idempotency key arrives twice, the server returns the result of the first successful execution rather than executing again.
What Good Error Handling Signals
An API’s error behavior reveals its design maturity in a way that success paths do not. Happy paths are easy to get right. Errors require deliberate design: choosing the right status codes, standardizing the error schema, validating comprehensively, and thinking through what the client needs to know to handle each failure.
Developers remember how an API behaves when things go wrong long after they have forgotten how it behaves when things go right. Get the errors right and the rest of the API becomes significantly easier to integrate and trust.