Contract-first API testing: How your OpenAPI spec becomes the test suite

Diagram showing how an OpenAPI spec feeds into automated test layers that guarantee schema compliance, auth enforcement, and documentation governance

4/14/2026

I built a Symfony API experiment to validate one idea: What if the OpenAPI spec is not just documentation, but the source of truth for your entire test suite?

The result is a testing concept where adding a new endpoint to the OpenAPI file is enough to get schema validation, auth checks, error format verification, and guardrail tests. No hand-written test boilerplate needed.

This post explains the testing architecture, the metadata system that drives it, and how everything fits together in CI.

The full experiment is open source: symfony-use-case-driven-api on GitHub

The problem: OpenAPI specs that lieLink to this section

Most teams write OpenAPI specs after the code is done. Or they generate them from annotations. Both approaches share the same weakness: the spec drifts from reality over time.

A field gets renamed in the code but not in the spec. A new error response is added but never documented. A required parameter becomes optional, but the spec still says required.

Nobody notices until a client breaks in production. Or until a frontend team builds against the spec and discovers it does not match the actual API.

The root cause is simple. The spec and the tests are separate artifacts. They evolve independently. There is no mechanism to keep them in sync.

The idea: make the spec executableLink to this section

The experiment flips this around. The OpenAPI spec is written first, by hand. Then the test infrastructure reads the spec and generates test scenarios from it.

This means:

Every documented response is validated against its JSON schema
Every secured endpoint is tested for missing and wrong-audience auth
Every documented error case is exercised
Every documented example is verified against its schema
Every route in the Symfony router must exist in the OpenAPI spec (and vice versa)

If the spec says it, the tests prove it. If the code does not match, the tests fail.

flowchart LR
  subgraph spec [Public Spec]
    paths[paths + schemas]
  end
  subgraph sidecar [Sidecar Metadata]
    meta[x-test-fixture]
    neg[x-test-negative]
    ex[x-test-examples]
  end
  spec --> build[build-openapi-test-spec]
  sidecar --> build
  build --> testjson[openapi.test.json]
  subgraph infra [Test Infrastructure]
    reader[OpenApiDocument]
    factory[Request Factory]
    resolver[Token Resolver]
    validator[Schema Validator]
  end
  testjson --> reader
  reader --> factory
  reader --> validator
  factory --> resolver
  resolver -->|ContractRequest| tests[PHPUnit Tests]
  tests -->|response payload| validator
  validator --> result{Schema match?}
  result -->|Yes| pass[CI green]
  result -->|No| fail[CI red]

The x-test metadata systemLink to this section

The key innovation is keeping test instructions next to the OpenAPI operations they describe, but separate from the public contract. Three custom extensions carry everything the test infrastructure needs to build executable requests. They live in sidecar files under openapi/src/test-metadata/paths/ , one per path file.

A build script merges them with the public spec into openapi.test.json at build time. The test harness reads this internal artifact. The published spec that consumers and documentation tools see stays clean.

openapi/src/
  paths/
    api_store_carts_{cart_id}_items.json   # public contract
  test-metadata/paths/
    api_store_carts_{cart_id}_items.json   # sidecar with x-test-*

openapi/build/
  openapi.json        # public bundle (no x-test-*)
  openapi.test.json   # merged bundle (public + x-test-*)

x-test-fixtureLink to this section

Tells the test factory how to build a valid request for the happy path. Each value is a token that gets resolved at runtime. The sidecar file mirrors the operation key from the public path file.

{
  "post": {
    "x-test-fixture": {
      "path": {
        "cart_id": "existing_cart"
      },
      "headers": {
        "Idempotency-Key": "generated_idempotency_key"
      },
      "body": {
        "product_id": "existing_product",
        "quantity": "valid_quantity"
      }
    }
  }
}

The token existing_cart does not contain a literal cart ID. It tells the test infrastructure: “Create a cart first, then use its ID here.” The token existing_product means: “Look up a real product from the fixtures.” The token generated_idempotency_key means: “Generate a random hex string.”

These tokens are baked into PHP provider classes. A registry of TestValueProvider implementations resolves them at runtime. Each provider knows a fixed set of tokens and how to produce real values for them.

flowchart LR
  sidecar["sidecar file\ncart_id: existing_cart"] --> build["build-openapi-test-spec.mjs"]
  build --> testjson["openapi.test.json"]
  testjson --> registry[TestValueResolverRegistry]
  registry --> cart[CartTestValueProvider]
  cart -->|POST /api/store/carts| real_cart["cart_abc123"]
  real_cart --> request[ContractRequest]

Provider	Tokens
`CatalogTestValueProvider`	`existing_product`, `category_with_products`, `unique_product_name`
`CartTestValueProvider`	`existing_cart`, `unknown_cart`
`CollectionPaginationTestValueProvider`	`products_sort_name_second_page_cursor`
`PrimitiveTestValueProvider`	`valid_quantity`, `below_minimum`, `generated_idempotency_key`

Adding a new token means writing PHP code. The sidecar file references the token by name, but the resolution logic lives in the provider. A possible next step would be to move the resolution instructions into the metadata itself, for example by describing which endpoint to call and which response field to extract. That would make the system fully self-describing without any token-specific PHP code. The current experiment does not go that far.

x-test-negativeLink to this section

Defines how to build requests that should fail. Each scenario maps to a specific error case.

{
  "x-test-negative": {
    "not_found": {
      "cart_id": "unknown_cart"
    },
    "invalid_body": {
      "quantity": "below_minimum"
    },
    "invalid_query": {
      "sort": "invalid_enum"
    }
  }
}

The test infrastructure takes the happy-path request and swaps individual values. not_found replaces the cart ID with a non-existent one. invalid_body sets the quantity below the documented minimum. invalid_query provides an enum value that is not in the allowed list.

x-test-examplesLink to this section

Maps documented response examples to the request parameters that produce them. This is how the test infrastructure can execute each example through HTTP and verify the response still matches the schema.

{
  "x-test-examples": {
    "200": {
      "default": {
        "query": {
          "sort": "sort_name",
          "limit": "limit_one"
        }
      },
      "secondPage": {
        "query": {
          "sort": "sort_name",
          "limit": "limit_one",
          "cursor": "products_sort_name_second_page_cursor"
        }
      }
    }
  }
}

All three extensions live in sidecar files next to the operations they describe. The public spec stays consumer-facing. The test harness reads the merged internal artifact.

Five layers of automated testsLink to this section

The test suite is organized into distinct layers. Each layer tests a different aspect of the contract. All of them read from the same OpenApiDocument class.

flowchart TD
  subgraph contract [Contract Tests - fully automated]
    schema["Schema Validation\n401 auth + 2xx success + response shape"]
    guardrail["Guardrail Tests\n400 headers + 404 not found + enum + body + idempotency"]
    docs["Documentation Governance\nrouter sync + operationId + examples + versioned specs"]
    examples["Example Verification\nrequest examples + response examples + HTTP execution"]
  end
  subgraph behavior [Behavior Tests]
    auto["Automated Behavior\nsort order + filters + search + pagination + limits"]
    manual["Manual Behavior\nbusiness semantics like cart merge"]
  end
  schema --> guardrail --> docs --> examples --> auto --> manual

Layer 1: Schema ValidationLink to this section

OpenApiSchemaValidationTest is the core. It iterates over every operation in the spec and runs three checks.

Every secured operation must reject missing auth:

public function testEverySecuredOperationRejectsMissingAuthenticationWithDocumentedProblemSchema(): void
{
    $openApi = $this->document();

    foreach ($openApi->operations() as $definition) {
        $path = $definition['path'];
        $method = $definition['method'];

        if (!$openApi->requiresSecurity($path, $method)) {
            continue;
        }

        $client = self::createClient();
        $request = $this->requestFactory()->withoutAuthentication($client, $path, $method);
        $this->requestFactory()->send($client, $request);

        self::assertResponseStatusCodeSame(401);

        $this->schemaValidator()->assertResponseMatchesSchema(
            $path, $method, '401',
            $this->decode($client->getResponse()->getContent()),
            'application/problem+json',
        );
    }
}

Every secured operation must reject wrong-audience auth. A store token must not work on admin endpoints. An admin token must not work on store endpoints.

Every operation must have an executable success case that matches its schema:

public function testEveryDocumentedOperationHasExecutableSuccessCaseThatMatchesSchema(): void
{
    $openApi = $this->document();

    foreach ($openApi->operations() as $definition) {
        $path = $definition['path'];
        $method = $definition['method'];
        $successStatus = $openApi->successStatus($path, $method);

        $client = self::createClient();
        $request = $this->requestFactory()->success($client, $path, $method);
        $this->requestFactory()->send($client, $request);

        self::assertResponseStatusCodeSame((int) $successStatus);

        $this->schemaValidator()->assertResponseMatchesSchema(
            $path, $method, $successStatus,
            $this->decode($client->getResponse()->getContent()),
        );
    }
}

This one test method validates every endpoint in the entire API. Add a new route to the OpenAPI spec with x-test-fixture metadata and this test covers it.

Layer 2: Guardrail TestsLink to this section

OpenApiGuardrailTest tests edge cases that every well-behaved API must handle.

Test	What it does
Required header enforcement	Drops each required header one by one, expects 400
Not-found responses	Sends unknown resource IDs, expects 404
Enum validation	Sends invalid enum values, expects 400
Request body validation	Sends invalid payloads, expects 400
Idempotency replay	Sends the same request twice, expects identical responses

All error responses are validated against the application/problem+json schema. This does not prove full RFC 9457 compliance on its own, but it guarantees that every error response follows a consistent Problem Details payload shape across the entire API.

Layer 3: Documentation GovernanceLink to this section

OpenApiDocumentationTest ensures the spec itself is complete and consistent.

public function testEveryContractApiRouteIsDocumentedInOpenApi(): void
{
    $router = self::getContainer()->get(RouterInterface::class);
    $documented = [];

    foreach ($this->openApi()->operations() as $definition) {
        $documented[] = strtoupper($definition['method']) . ' ' . $definition['path'];
    }

    $actual = [];

    foreach ($router->getRouteCollection()->all() as $route) {
        $path = $route->getPath();

        if (!str_starts_with($path, '/api/store')
            && !str_starts_with($path, '/api/admin')) {
            continue;
        }

        foreach ($route->getMethods() as $method) {
            $actual[] = strtoupper($method) . ' ' . $path;
        }
    }

    sort($documented);
    sort($actual);

    self::assertSame($documented, $actual);
}

Both arrays are sorted before comparison so that ordering differences do not cause false failures. The test only checks membership: if a route exists in code but not in the spec, it fails. If the spec documents a route that does not exist, it fails. They must be identical.

Additional governance checks:

Every operation must have a summary and operationId
Every operation must have a success response schema
Every secured operation must document a 401 response
Every parameter must have a schema
Every request body must have at least one example
Frozen versioned specs must exist on disk

Layer 4: Example VerificationLink to this section

OpenApiExamplesTest validates that documented examples stay schema-valid.

Every request example must match its request schema
Every response example must match its response schema
Every GET success example must execute through HTTP and return a schema-valid response

This catches examples that fall out of sync with the schema. It does not compare the live response to the documented example value. A stale example can still pass if it has the right shape. But it guarantees that no example in your docs violates its own schema.

Layer 5: Automated Behavior TestsLink to this section

AutomatedApiBehaviorTest tests collection behavior by reading query parameter definitions from the spec.

Sortable collections must return data in the requested order
Category filters must return only matching items
Search queries must find relevant products
Paginated collections must return consistent cursor metadata
Invalid limits must return 400

These tests discover their scenarios from the OpenAPI spec. If a collection endpoint documents a sort enum parameter, the test automatically exercises every sort option.

A note on payload size driftLink to this section

The five layers above cover contract correctness. They do not catch a different kind of rot. Fields quietly grow on every response, until a “small” list endpoint ships three times the payload it did a year ago.

You can handle this in two places.

Option 1: enforce it in CI. Pick a small set of canonical requests, pin each to a specific Contract-Version, measure the raw uncompressed body, and fail the build if bytes exceed a committed budget.

The experiment does this in PayloadSizeBudgetTest . The budgets are checked into git next to the test:

return [
    'store_products_list_limit_1' => [
        'description' => 'GET /api/store/products?sort=name&limit=1 on contract 2026-04-01',
        'contract_version' => '2026-04-01',
        'expected_status' => 200,
        'baseline_bytes' => 1522,
        'max_bytes' => 1700,
    ],
    'store_product_detail' => [
        'description' => 'GET /api/store/products/{product_id} on contract 2026-04-01',
        'contract_version' => '2026-04-01',
        'expected_status' => 200,
        'baseline_bytes' => 725,
        'max_bytes' => 800,
    ],
    // ...
];

The test is a single data-provider method. Each scenario sends one request and asserts the raw response byte count:

#[DataProvider('payloadBudgets')]
public function testCanonicalResponsesStayWithinCommittedPayloadBudget(
    string $scenario,
    array $budget,
): void {
    $client = self::createClient();
    $request = $this->requestForScenario($client, $scenario, $budget['contract_version']);

    $client->request($request->method, $request->uri, server: $request->server);

    self::assertResponseStatusCodeSame($budget['expected_status']);

    $actualBytes = \strlen($client->getResponse()->getContent());

    self::assertLessThanOrEqual(
        $budget['max_bytes'],
        $actualBytes,
        \sprintf(
            'Payload budget exceeded for %s. Baseline: %d bytes. Budget: %d bytes. Actual: %d bytes.',
            $budget['description'],
            $budget['baseline_bytes'],
            $budget['max_bytes'],
            $actualBytes,
        ),
    );
}

Four rules keep the check useful instead of noisy:

Measure raw uncompressed JSON bytes. Gzip would hide field growth.
Pin every request to a specific Contract-Version like 2026-04-01. A breaking version is expected to change the payload size, so the budget belongs to one contract.
Keep the scenario set small. Six canonical requests in this experiment. Not every query permutation.
Commit the baseline and the budget. The baseline records where you were. The budget is the ceiling you refuse to cross. When someone adds a field, the diff shows both the new response and the new budget.

Option 2: watch it in production. Response byte size per route is a standard observability metric. Grafana, Datadog, and most APM tools already expose it. Alerting on growth over seven or thirty days works without committing canonical scenarios.

Neither option is strictly better. CI budgets catch drift before merge and keep the conversation on the pull request. Observability catches drift across real traffic and does not need you to pick scenarios. Pick one. Do not ignore both.

How schema validation works under the hoodLink to this section

The OpenApiSchemaValidator class is intentionally simple. It uses opis/json-schema for the heavy lifting.

final readonly class OpenApiSchemaValidator
{
    private Validator $validator;

    public function __construct(private OpenApiDocument $document)
    {
        $this->validator = new Validator();
    }

    public function assertResponseMatchesSchema(
        string $path, string $method, string $status,
        array $payload, string $contentType = 'application/json',
    ): void {
        $schema = $this->document->responseSchema($path, $method, $status, $contentType);
        $result = $this->validator->validate(
            $this->toJsonValue($payload),
            $this->toJsonValue($schema),
        );

        if (!$result->isValid()) {
            throw new \RuntimeException(sprintf(
                '%s %s response %s does not match the OpenAPI schema.',
                strtoupper($method), $path, $status,
            ));
        }
    }
}

The OpenApiDocument class handles all the complexity: reading the bundled JSON, resolving $ref references recursively, and extracting schemas, parameters, and metadata. The validator just compares payloads against schemas.

The request factoryLink to this section

The OpenApiOperationRequestFactory is the most complex piece. It builds executable HTTP requests from the OpenAPI spec and metadata.

For a success case, it:

Reads the audience adapter for the endpoint (store or admin)
Sets auth headers based on the audience
Resolves path parameters using x-test-fixture tokens
Resolves required query parameters
Builds the request body from the request schema and fixture tokens
Returns a ContractRequest ready to send

public function success(KernelBrowser $client, string $path, string $method): ContractRequest
{
    $context = $this->context($client, $path, $method);
    $server = $context->audience->baseServer($context);

    foreach ($this->document->parameters($path, $method) as $parameter) {
        if ('header' !== ($parameter['in'] ?? null)
            || !((bool) ($parameter['required'] ?? false))) {
            continue;
        }

        $name = (string) $parameter['name'];
        $resolver = $this->fixtureResolver($path, $method, 'headers', $name);
        $server[$this->headerServerKey($name)] = null !== $resolver
            ? (string) $this->values->resolve($resolver, $context)
            : $context->audience->defaultHeaderValue($name, $context);
    }

    $uri = $this->resolveUri($context, unknownIdentifiers: false);
    $json = $this->buildRequestBody($path, $method);

    return new ContractRequest(strtoupper($method), $uri, $server, $json);
}

For negative cases, it starts with the success request and mutates it. Drop auth headers for 401 tests. Swap path IDs for 404 tests. Break the body for 400 tests.

The CI pipelineLink to this section

Everything runs in a single GitHub Actions workflow. The pipeline validates the spec, the code, and the contract in sequence.

steps:
  # OpenAPI validation
  - name: Lint OpenAPI
    run: npm run openapi:lint

  - name: Check OpenAPI structure
    run: npm run openapi:check:structure

  - name: Bundle OpenAPI JSON
    run: npm run openapi:bundle:json

  - name: Detect breaking OpenAPI changes
    run: npm run openapi:diff

  - name: Prove machine-consumability with api-gen
    run: npm run openapi:proof:api-gen

  # PHP quality
  - name: Run PHPStan
    run: composer phpstan

  - name: Run PHPUnit
    run: php bin/phpunit

The key steps before PHPUnit even runs:

openapi:lint validates the spec against OpenAPI rules using Redocly CLI
openapi:check:structure prevents duplicate schemas and inline complex schemas
openapi:diff detects breaking changes against the committed baseline using OASDiff
openapi:proof:api-gen proves the spec is machine-consumable by generating TypeScript types with @shopware/api-gen

The OASDiff step also runs as a PR annotation on pull requests, so reviewers see breaking changes directly in the diff.

What you get for freeLink to this section

When you add a new endpoint to this system, the only thing you write is:

The OpenAPI path file with x-test-fixture and x-test-negative metadata
The controller and use-case handler
The response DTO

You do not write:

Auth tests (the schema validation layer handles them)
Schema compliance tests (automatic for every operation)
Error format tests (the guardrail layer validates all error responses)
Documentation completeness tests (the governance layer catches missing docs)
Example validation tests (automatic for all documented examples)

Manual behavior tests are only needed for business semantics that cannot be inferred from the spec. For example: “Adding the same product to a cart twice should merge the quantity instead of creating a second line item.” That is business logic. The spec cannot express it.

The experiment also enforces architecture rules through PHPat in the PHPStan run. Controllers never access Doctrine repositories directly. Use-case handlers do not depend on HTTP layer classes. DTOs do not reference entities. Contract tests protect the external surface. Architecture tests protect the internal structure.

TakeawaysLink to this section

Treat OpenAPI as a real contract. Not as documentation you update when you remember. Write the spec first. Test against it automatically.

Embed test metadata in the spec. The x-test-* extensions keep test instructions close to the operation they describe. No separate mapping files. No guessing.

Automate the boring parts. Auth checks, schema validation, error format compliance, documentation completeness. None of these need hand-written tests per endpoint.

Keep manual tests for business logic. The contract tells you the shape. The behavior tests tell you the semantics. Both are needed. But only the semantics require hand-written tests.

Run it all in CI. Lint the spec, check for breaking changes, validate schemas, run the tests. Every push. Every pull request. No exceptions.

The API surface is intentionally kept small. The depth around those endpoints is not. The goal is to prove that contract-first, use-case-driven API development produces real confidence with minimal test boilerplate.

The bigger pictureLink to this section

This post focused on the testing concept. But the experiment covers more ground than that. The repository also includes:

Date-based request versioning with frozen per-version specs and backward compatibility tests
RFC 9457 Problem Details for all error responses with a full error catalog
Cursor pagination inspired by RFC 9865 (SCIM cursor pagination) instead of offset/limit
Idempotency on write endpoints with replay detection
Deprecation signaling via Deprecation, Sunset, and Link headers
k6 load tests for all major flows, phpbench benchmarks for hot paths, and optional payload-size budgets on canonical HTTP responses
Machine-consumability proofs generating TypeScript types with @shopware/api-gen
Architecture Decision Records documenting every major design choice
A Shopware transfer playbook with class-by-class mapping and a phase-by-phase adoption path

The goal was never to build a small demo. The goal was to prove that contract-first, use-case-driven API development works with the same discipline across testing, versioning, documentation, performance, and architecture. At the time of writing, the 120+ source files, 40+ documentation pages, and 28 test files are the actual experiment.

The testing concept described in this post is stable. The next steps are about breadth: more use cases to test if the module structure holds up when the surface grows, write-heavy flows to stress-test idempotency, a real authentication layer, and more audiences beyond store and admin.

The repository is open source. If the testing concept or any other part of the experiment is useful for your own API work, take what you need: symfony-use-case-driven-api on GitHub

ReferencesLink to this section

Standards and RFCs: OpenAPI Specification | RFC 9457 - Problem Details | RFC 9865 - Cursor Pagination | RFC 7234 - HTTP Caching

API design inspiration: Stripe API | Stripe API Versioning | Zalando API Guidelines | Microsoft API Guidelines

Tools: opis/json-schema | Redocly CLI | OASDiff | PHPat | PHPStan

Contract-first API testing: How your OpenAPI spec becomes the test suite

The problem: OpenAPI specs that lieLink to this section#

The idea: make the spec executableLink to this section#

The x-test metadata systemLink to this section#

x-test-fixtureLink to this section#

x-test-negativeLink to this section#

x-test-examplesLink to this section#

Five layers of automated testsLink to this section#

Layer 1: Schema ValidationLink to this section#

Layer 2: Guardrail TestsLink to this section#

Layer 3: Documentation GovernanceLink to this section#

Layer 4: Example VerificationLink to this section#

Layer 5: Automated Behavior TestsLink to this section#

A note on payload size driftLink to this section#

How schema validation works under the hoodLink to this section#

The request factoryLink to this section#

The CI pipelineLink to this section#

What you get for freeLink to this section#

TakeawaysLink to this section#

The bigger pictureLink to this section#

ReferencesLink to this section#