API Performance Optimization: Compression, Connection Pooling, and N+1 Queries
API performance problems fall into two categories: the ones that appear immediately during development and are fixed before they matter, and the ones that are invisible during development and catastrophic at production scale. The second category is where the interesting work is. Understanding where APIs slow down under load — and the specific techniques that address each cause — is the difference between an API that performs at scale and one that requires emergency remediation after launch.
Compression: The Free Lunch
Response compression is one of the highest-leverage performance improvements available to an API, and one of the most commonly skipped. Compressing a JSON response with gzip reduces its size by 60-80% for typical payloads. Brotli compression achieves even better ratios, particularly for structured text like JSON and XML.
Enable compression at the application or reverse proxy level for all text-based responses: application/json, text/csv, application/xml, and any other text format. Binary formats (images, already-compressed archives, protobuf) do not benefit meaningfully from compression and should be excluded.
The client signals its compression support via Accept-Encoding: gzip, br. The server compresses the response and includes Content-Encoding: gzip or Content-Encoding: br to indicate the applied encoding. Most HTTP client libraries handle decompression automatically, making this transparent to application code.
The computation cost of compression is small compared to the network cost of transferring large uncompressed payloads, particularly for clients on mobile networks or geographic distances from the API server. At scale, reduced transfer size also reduces bandwidth costs. There is almost no downside to enabling compression on text API responses.
HTTP/2 and Connection Multiplexing
HTTP/1.1 sends one request per connection at a time. Clients work around this by opening multiple parallel connections — browsers open up to six per host — but this has costs: TCP and TLS handshake overhead per connection, head-of-line blocking within each connection.
HTTP/2 multiplexes multiple requests over a single connection. Dozens of requests can be in flight simultaneously without the overhead of multiple connections. The server can push responses the client has not yet requested. Header compression (HPACK) reduces the overhead of repetitive headers across requests.
Most API clients and servers support HTTP/2. Enabling it at the server or load balancer level is typically a configuration change rather than code change. The improvement in scenarios with many concurrent small requests — common in microservice architectures and in APIs called frequently by web frontends — is significant.
Connection Pooling
Every database query in an API requires a database connection. Establishing a connection involves a TCP handshake, authentication, and negotiation — overhead that is negligible for a single query and catastrophic when repeated for every query in a high-traffic API. Connection pooling maintains a set of established database connections that are reused across requests.
Without connection pooling, an API handling 1,000 requests per second that each make three database queries attempts to establish 3,000 new database connections per second. Database servers have connection limits — PostgreSQL defaults to 100 — and connection establishment is expensive. The API will either exhaust the database’s connection limit or spend more time connecting than querying.
With connection pooling, a fixed pool of connections (sized appropriately for the expected concurrency) is established at startup and reused. Query time is query time; connection overhead is amortized over thousands of queries per connection.
Configure pool size based on concurrency: a pool too small causes requests to wait for a connection; a pool too large exhausts database resources. A starting point is (number of CPU cores × 2) + effective spindle count, following PgBouncer’s recommendation. Monitor pool utilization and adjust based on observed wait times.
For serverless environments where connection pooling within a process is insufficient — many ephemeral instances each with their own pools can overwhelm the database — use an external connection pooler like PgBouncer or RDS Proxy that manages the database-side connection count regardless of the number of application instances.
The N+1 Query Problem
The N+1 query problem is the most common performance failure in database-backed APIs, and it is entirely invisible in low-traffic development environments.
The pattern: a request for a list of orders fetches N orders in one query. Then, for each order, a separate query fetches the customer details. Result: 1 query for the list, N queries for customer data — N+1 total. At N=10, this is 11 queries. At N=100 (a paginated list of 100 orders), it is 101 queries. Under concurrent load, this multiplies.
The fix is eager loading (also called batching or join loading): load the related data in a single query alongside the primary data, rather than one query per item.
-- N+1 (bad): one customer query per order
SELECT * FROM orders WHERE ...;
-- for each order:
SELECT * FROM customers WHERE id = ?;
-- Eager load (good): one query for all related customers
SELECT orders.*, customers.name, customers.email
FROM orders
JOIN customers ON orders.customer_id = customers.id
WHERE ...;
ORM-based applications often introduce N+1 queries invisibly through lazy loading — the ORM fetches related records on access, which looks harmless in code but generates a query per access in a loop. Most ORMs provide eager loading directives (include, with, preload depending on the ORM) that prevent this.
Identify N+1 problems by logging query counts per request. Any request generating more than 5-10 queries is a candidate for investigation. Development tools like Django Debug Toolbar, Laravel Debugbar, or query log analysis can surface N+1 patterns before they cause problems in production.
Response Shape Optimization
Returning more data than the client needs wastes serialization time, bandwidth, and client-side parsing work. Sparse fieldsets — the ability to request specific fields rather than complete resource representations — reduce response size and processing overhead for clients that need only a subset of a resource’s fields.
GET /users?fields=id,email,name
Field filtering is particularly valuable for collection endpoints where each item has many fields but the client only needs a few for list display. Returning 50 fields per item for a 100-item list is 5,000 field serializations; returning 3 fields is 300.
GraphQL’s field selection is the most complete implementation of this principle, but REST APIs can approximate it with query parameter field filtering for clients with known data needs.
Database Indexing for API Query Patterns
Database indexes are tuned to query patterns. If your API exposes filtering by status, created_at, and user_id, the database needs indexes on those columns (and likely compound indexes covering common multi-field filter combinations). An API without indexes tuned to its query patterns will be fast with small datasets and increasingly slow as data grows.
Identify the query patterns the API actually generates in production — not the queries a developer thinks it generates, but the queries observed in the database slow query log — and ensure those patterns are covered by appropriate indexes. The most common miss is a compound query that is technically covered by two single-column indexes but would be dramatically faster with a compound index.
Query performance problems that are acceptable at 10,000 rows become critical at 10,000,000 rows. Load testing with realistic data volumes (not just realistic request rates) catches index gaps before they become production incidents.
Measuring Before Optimizing
The necessary caveat: performance optimization without measurement is guesswork. Profile before optimizing. Measure response times by endpoint in production. Identify which endpoints are slow, not which ones seem like they might be slow. Check query counts per request. Look at p99 latency, not just mean.
The payoff from compressing responses, pooling connections, and eliminating N+1 queries is typically much larger than the payoff from micro-optimizing code that was not actually a bottleneck. Fix the structural problems first; micro-optimize last if the measurements say it matters.