API Testing Strategies: Unit, Integration, Contract, and Load Testing
Testing an API is not the same as testing a library or a UI. An API is a contract — a public interface that other systems depend on — and the testing strategy must reflect that. Unit tests tell you whether individual functions work. Integration tests tell you whether the system behaves correctly end to end. Contract tests tell you whether the API still honors its published interface. Load tests tell you when it breaks. All four serve distinct purposes and none is a substitute for the others.
Unit Testing: The Foundation
Unit tests for APIs focus on the logic that does not require HTTP or a database. Validation functions, business logic, transformation code, utility functions — everything that can be extracted from handler context and tested in isolation. Pure functions are easy to test; the goal of structuring API code is to maximize the proportion of logic that is pure and therefore easily testable.
For handler logic that is difficult to test without HTTP infrastructure, use your framework’s test utilities to construct requests and assert on responses without starting a real server. Most modern frameworks — Express, FastAPI, Django REST Framework, Go’s net/http/httptest — provide request construction utilities that allow handlers to be called directly in tests. This is faster than full integration tests and still exercises the handler boundary.
Unit tests should run in milliseconds and should not require a database, external services, or network access. Any test that requires these is an integration test, and should be labeled and managed as such.
Integration Testing: Testing the Real System
Integration tests exercise the full request path: HTTP in, handler logic, database operations, and HTTP response out. They catch the class of bugs that unit tests cannot — incorrect SQL queries, unexpected ORM behavior, schema mismatches between what the handler expects and what the database returns, middleware interactions that only appear in the full stack.
Integration tests require a real database — typically a test database that is seeded and reset between tests — and a running application. They run slower than unit tests and are more expensive to maintain. The tradeoff is that they catch real failures. A unit test that mocks the database can pass while the actual database interaction is broken; an integration test catches this immediately.
Test factories and fixtures make integration tests maintainable. Instead of setting up database state from scratch in every test, define factory functions that create valid test objects with sensible defaults and override specific fields per test. The goal is tests that express what they are testing, not tests that spend 40 lines constructing prerequisite state.
Test isolation matters. Each test should start from a clean, known state and not depend on the output of previous tests. Tests that pass in isolation but fail when run as a suite have hidden dependencies. Use database transactions rolled back after each test, or truncate-and-reseed between tests, to enforce isolation.
Contract Testing: Did the API Change?
Contract tests verify that the API still conforms to its published specification. They catch the gap between what the OpenAPI spec says an endpoint returns and what it actually returns — a gap that grows silently over time as implementation drifts from documentation.
Tools like Dredd and Schemathesis read an OpenAPI spec and generate or execute test cases against a running API. Schemathesis in particular does property-based testing: it generates a large variety of valid and invalid inputs, sends them to the API, and verifies that responses conform to the spec. It discovers edge cases that hand-written tests miss.
Contract tests at the provider level (your API tests its own spec) catch implementation drift. Consumer-driven contract tests add another dimension: consumers of your API publish their expectations, and your CI pipeline verifies that your implementation still satisfies every consumer’s contract before shipping changes. Pact is the primary framework for this. It prevents breaking changes from reaching consumers before they are discovered.
For public APIs, contract tests are the minimum viable protection against accidental breaking changes. No manual review process catches schema drift as reliably as automated tests running on every commit.
Load Testing: When Does It Break
Load testing answers a different category of question than correctness testing. Correctness tests ask “does this work?” Load tests ask “at what scale does this stop working, and how does it fail?”
The goal is not to verify that the API handles normal load — that is what integration tests verify. The goal is to find the failure boundary: the request rate or concurrency level at which response times degrade unacceptably, error rates spike, or the system becomes unavailable. Finding this boundary in a test environment is significantly better than finding it when production traffic exceeds it.
Tools like k6, Locust, and Gatling script realistic traffic patterns — a mix of endpoint types in proportions matching real usage — and ramp up concurrency until the system degrades. The output is not a pass/fail; it is a performance profile: response time percentiles (p50, p95, p99) at various load levels, error rates, and the approximate ceiling before degradation begins.
Load tests reveal architectural problems that correctness tests cannot: N+1 query patterns that are fast with small datasets and catastrophic with large ones, connection pool exhaustion under concurrency, memory leaks that accumulate over time, rate limits in upstream dependencies that are invisible at low load and hit constantly at production scale.
Running load tests in CI on every merge is rarely practical — they take too long and require realistic infrastructure. Running them regularly (weekly, pre-release) and before known traffic events (launches, campaigns) is a reasonable cadence.
Testing Error Paths
Most testing effort goes to happy paths. Error paths — what happens when authentication fails, when the database is unavailable, when an upstream service times out, when request validation rejects input — are where reliability is actually determined.
Test every 4xx your API can return with a request that should produce it. A 404 handler that returns a 500 because it fails to serialize the error response is a real bug. A validation error that returns 200 with an error body (instead of 400) is a real bug. These are caught by tests targeting error conditions, not by tests that only exercise successful requests.
For infrastructure failure scenarios — database unavailable, upstream service returning 500 — use dependency injection or mock libraries to simulate failures at the integration boundary and verify that your API responds with appropriate error codes and does not surface internal implementation details in error messages.
Test Coverage as a Signal
Test coverage metrics tell you which lines were executed during tests. They do not tell you whether the tests were meaningful or whether the important branches were covered. A handler with 100% line coverage can still be missing tests for every error condition if the tests only exercise the success path.
Use coverage as a diagnostic rather than a target. Lines with zero coverage are worth investigating. High coverage with poor error path testing gives false confidence. The question to ask about any test suite is: if I introduced a subtle bug in this handler, would a test catch it before it reached production? If the answer is often no, the test suite needs more targeted tests, not a higher coverage percentage.
The combination of unit tests for logic, integration tests for full-stack correctness, contract tests for specification conformance, and load tests for performance boundaries gives an API a well-rounded test suite. Each layer catches what the others miss. The investment in building this infrastructure compounds over the life of the API.