Async Job APIs: Handling Long-Running Operations the Right Way

May 2, 2026

HTTP is a synchronous protocol. A client sends a request and waits for a response. This works well for operations that complete in milliseconds — a database read, a record update, a validation check. It breaks down for operations that take seconds, minutes, or longer: video transcoding, report generation, bulk data exports, machine learning inference on large inputs, email campaigns to millions of recipients.

Holding an HTTP connection open for a minute or more is technically possible and practically wrong. Clients time out. Load balancers have maximum request durations. Users cannot tell the difference between a server still working and a server that silently failed. Long synchronous operations produce brittle integrations and poor user experiences.

The correct pattern for long-running operations is asynchronous job processing: accept the request immediately, return a job reference, and let the client check status or be notified when the work is done.

The Core Pattern

The client submits the operation as a POST request. The server creates a job, enqueues it for processing, and immediately returns a job representation with a 202 Accepted status:

POST /exports HTTP/1.1
Content-Type: application/json

{"format": "csv", "date_range": "2026-Q1"}

HTTP/1.1 202 Accepted
Location: /jobs/job_f4e8b2d1
Content-Type: application/json

{
  "id": "job_f4e8b2d1",
  "status": "queued",
  "created_at": "2026-05-02T10:30:00Z",
  "estimated_duration_seconds": 45
}

202 Accepted means “the request has been accepted for processing, but processing has not completed.” The Location header points to the job resource where the client can check status. The response body includes the job ID and any immediately available metadata.

The client polls the job endpoint:

GET /jobs/job_f4e8b2d1

{
  "id": "job_f4e8b2d1",
  "status": "processing",
  "progress": 0.42,
  "created_at": "2026-05-02T10:30:00Z",
  "updated_at": "2026-05-02T10:30:18Z"
}

When complete:

{
  "id": "job_f4e8b2d1",
  "status": "completed",
  "progress": 1.0,
  "created_at": "2026-05-02T10:30:00Z",
  "completed_at": "2026-05-02T10:30:51Z",
  "result": {
    "download_url": "https://storage.example.com/exports/export_f4e8b2d1.csv",
    "expires_at": "2026-05-03T10:30:51Z",
    "row_count": 84231
  }
}

Job Status Design

Define a clear, finite set of job statuses and document the transitions between them. A minimal set: queued, processing, completed, failed. For jobs that can be cancelled: add cancelling and cancelled. For jobs with multiple distinct phases: consider phase-specific statuses or a phase field alongside the primary status.

The terminal states — completed and failed — must be explicit and stable. A completed job should remain accessible long enough for the client to retrieve its result, even if the underlying computation artifacts are cleaned up. A failed job must include error information that explains why it failed and whether the operation can be retried.

{
  "id": "job_a3c7e9b2",
  "status": "failed",
  "error": {
    "code": "source_unavailable",
    "message": "The requested data source could not be accessed.",
    "retryable": true
  },
  "failed_at": "2026-05-02T10:31:02Z"
}

retryable is a useful field: it tells the client whether submitting a new job with the same parameters is likely to succeed, or whether the failure reflects a problem with the input that a retry will not fix.

Polling Strategy and Backoff

Tell clients how to poll. Include a Retry-After header on 202 responses and job status responses indicating the earliest time the client should check again:

Retry-After: 5

This prevents clients from hammering the status endpoint at maximum frequency, which generates unnecessary load and rarely produces faster results. Suggest intervals calibrated to the expected job duration: short intervals for jobs that typically complete in seconds, longer intervals for jobs that take minutes.

Recommend exponential backoff in your documentation. A client that polls every second for a five-minute job makes 300 requests for information that will not change between polls. A client that starts at two-second intervals, doubles up to a cap of 30 seconds, and then polls at 30-second intervals makes a fraction of those requests with negligible impact on perceived latency.

Webhook Notification as an Alternative

For clients that can receive webhooks, notification is superior to polling. When the job completes or fails, the server sends a POST to the client’s webhook endpoint with the job result. The client does not need to poll at all — it receives the result the moment it is available.

Offer both. Long-running operations are a context where supporting both polling and webhooks serves different integration architectures. Server-side applications that can register webhook endpoints benefit from notifications. Serverless functions or batch jobs that run on a schedule and check status periodically are better served by polling.

Job Metadata and Management

Expose job management endpoints appropriate for your use case. At minimum: retrieve a specific job by ID, list recent jobs for an account. For long queues: cancel a pending or processing job. For administrative needs: retry a failed job.

Job retention policy matters. Jobs and their result metadata should be retained long enough for integrators to retrieve results and build reliable automated workflows around them. If a job completes at midnight and the integrator’s system processes it at 6am, a six-hour retention window breaks the integration. Twenty-four hours is a reasonable default minimum for result metadata; longer for expensive operations whose results should not require immediate retrieval.

Separate job metadata retention from result artifact retention. The metadata (status, timing, error details) can be stored cheaply for weeks. The actual result artifact (a large CSV export, a rendered video) may be expensive to store and can be cleaned up sooner.

Concurrency Limits

Most job processing systems have finite capacity. Define and communicate concurrency limits — how many jobs can run simultaneously per account — and what happens when the limit is reached. A queued job that cannot start because the account is at its concurrency limit should say so in the status response, not leave the client to infer why the job has been queued for an unusually long time.

Communicate queue depth when relevant. An estimated_wait_seconds field in the queued status is useful context for integrators deciding whether to wait or take alternative action. It does not need to be precise — an order-of-magnitude estimate is more useful than silence.

Long-running operation APIs are a specific design discipline within the broader REST surface. They are worth getting right because errors in their design — ambiguous status transitions, no polling guidance, missing error information, inadequate retention — produce integrations that fail subtly and are difficult to debug. The pattern is not complex; the implementation discipline is.