JSON Schema and API Validation: Defining and Enforcing Your Data Contracts
An API’s data contract — what it accepts as input, what it returns as output — exists whether you define it formally or not. Leaving it informal means the contract lives only in documentation prose and developer intuition, is inconsistently enforced, and drifts between what the documentation says and what the code actually handles. JSON Schema provides a standard, machine-readable format for expressing data contracts that can drive validation, documentation, and testing from a single source of truth.
What JSON Schema Is
JSON Schema is a declarative language for describing the structure of JSON data. A JSON Schema document specifies the expected type, format, allowed values, required fields, and constraints for a JSON value. Any JSON data can be validated against a schema to determine whether it conforms.
A minimal schema:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"required": ["email", "name"],
"properties": {
"email": {
"type": "string",
"format": "email",
"maxLength": 254
},
"name": {
"type": "string",
"minLength": 1,
"maxLength": 100
},
"age": {
"type": "integer",
"minimum": 0,
"maximum": 150
},
"role": {
"type": "string",
"enum": ["user", "admin", "viewer"]
}
},
"additionalProperties": false
}
This schema defines an object with two required properties (email and name), two optional properties (age and role), and explicit constraints on each. additionalProperties: false rejects any properties not listed — important for security (preventing mass assignment) and for communicating exactly what the API accepts.
Core Validation Keywords
Type constraints define what kind of value is valid: string, number, integer, boolean, array, object, or null. Multiple types can be expressed as an array: "type": ["string", "null"] accepts either a string or null — the standard pattern for optional nullable fields.
String constraints: minLength, maxLength, pattern (a regular expression the string must match), format (a semantic annotation like email, date-time, uuid, uri that validators may or may not enforce depending on configuration).
Number constraints: minimum, maximum, exclusiveMinimum, exclusiveMaximum, multipleOf.
Array constraints: minItems, maxItems, uniqueItems (all items must be distinct), items (the schema each array item must conform to).
Object constraints: required (array of property names that must be present), properties (schemas for named properties), additionalProperties (schema or boolean controlling whether properties not listed in properties are permitted), minProperties, maxProperties.
Composition keywords enable complex schemas: allOf (must conform to all listed schemas), anyOf (must conform to at least one), oneOf (must conform to exactly one), not (must not conform). These allow expressing union types, conditional schemas, and discriminated unions.
Validation at the API Boundary
The practical value of JSON Schema is runtime validation: validating every incoming request body against the schema before the request reaches business logic. Invalid requests are rejected with a 400 response describing exactly what failed; valid requests proceed.
Most web frameworks have JSON Schema validation libraries or middleware:
from jsonschema import validate, ValidationError
USER_SCHEMA = {
"type": "object",
"required": ["email", "name"],
"properties": {
"email": {"type": "string", "format": "email"},
"name": {"type": "string", "minLength": 1}
},
"additionalProperties": False
}
def create_user(request_body):
try:
validate(instance=request_body, schema=USER_SCHEMA)
except ValidationError as e:
return error_response(400, "validation_error", e.message)
# proceed with valid data
Centralized schema validation at the request boundary means business logic never receives malformed input. It eliminates defensive type checking and None-handling scattered through handler code. It produces consistent error responses because all validation failures come from the same validation layer.
Schema Reuse and Composition
Defining the same schema in multiple places creates drift — one copy gets updated, others do not. JSON Schema’s $ref keyword references a schema defined elsewhere:
{
"properties": {
"user": { "$ref": "#/$defs/User" },
"created_by": { "$ref": "#/$defs/User" }
},
"$defs": {
"User": {
"type": "object",
"required": ["id", "email"],
"properties": {
"id": {"type": "string"},
"email": {"type": "string", "format": "email"}
}
}
}
}
$defs (formerly definitions) is the conventional location for reusable schemas. A library of well-defined component schemas — User, Address, Money, Timestamp, Pagination — that are referenced throughout the API schema eliminates duplication and ensures consistency.
This maps directly to OpenAPI’s components/schemas section, which is JSON Schema (with some extensions) and uses the same $ref reference mechanism. An API designed around a strong schema component library has an OpenAPI spec that is clean, reusable, and aligned with the runtime validation.
Output Validation
Schemas are typically applied to inputs, but validating outputs — verifying that what the API returns matches its documented schema — catches a different category of bug. An API that returns a field as null when the schema says it will always be a string will cause client-side type errors in strictly-typed clients that trusted the schema.
Contract tests with tools like Schemathesis validate that API responses conform to the OpenAPI/JSON Schema spec. Running these in CI catches schema drift before it reaches production. The investment is small and the catch rate for undocumented schema changes is high.
$schema and Draft Versions
JSON Schema has evolved through several draft versions (Draft 4, 6, 7, 2019-09, 2020-12) with changes in keyword names and behavior across drafts. Always specify the $schema keyword identifying which draft your schema uses. Validation libraries need to know which draft to use, and the drafts are not fully backwards compatible.
OpenAPI 3.0 uses a dialect of JSON Schema Draft 4 (with some additions and restrictions). OpenAPI 3.1 aligns more closely with JSON Schema 2020-12. If you are writing schemas for OpenAPI, be aware that the dialect in use depends on the OpenAPI version.
Generating Schemas from Types
In typed languages, schemas can be generated from type definitions rather than maintained separately. TypeScript types can generate JSON Schema through tools like typescript-json-schema or zod (which produces both TypeScript types and JSON Schema from a single definition). Python’s Pydantic generates JSON Schema from Python class definitions. This keeps types and schemas in sync mechanically.
The generated schema becomes the canonical validation artifact, and the same schema is used in OpenAPI documentation. When the type changes, the schema changes, the documentation changes, and the runtime validation changes — all from one source. This is the architecture that eliminates the drift between what the API says it accepts and what it actually validates.
Schema-first API design produces cleaner, more consistent, more reliable APIs. The discipline of defining every input and output formally, before writing handler code, forces design decisions to happen at the right time and produces a machine-readable contract that tools can act on throughout the API lifecycle.