Skip to content

Data contracts

This page is the canonical reference for Data contracts.

Who this is for

  • Developers and data engineers implementing logging, pipelines, and data validation
  • Analysts and recommendation engineers running recsys-eval
  • Operators who need to reason about “what was served” vs “what was clicked” vs “what artifact version is live”

What you will get

  • The contract types used across the suite (serving, evaluation, pipelines)
  • Minimal examples you can copy/paste
  • Where the canonical schemas live and how they are versioned

Overview: three contract families

  • Evaluation events (for recsys-eval)
  • Purpose: measure quality (offline regression, experiments).
  • Join key: request_id (exposures ↔ outcomes ↔ assignments).
  • Details + examples: Eval events
  • Join semantics: Event join logic
  • Minimum instrumentation spec: Minimum instrumentation

  • Serving logs (what the service emitted)

  • Purpose: auditable “what was served” record.
  • Canonical schema: Exposure schema (JSON)

  • Pipelines + artifacts (what pipelines consume/publish)

  • Purpose: convert interactions into versioned artifacts and a manifest pointer.
  • Interaction schema: Interactions schema (JSON)
  • Manifest schema: Manifest schema (JSON)

Evaluation events (recsys-eval): what you must be able to produce

If your goal is “measure lift” or “decide what to ship”, implement these:

  • exposure.v1 (what you showed)
  • outcome.v1 (what the user did)
  • assignment.v1 (optional; experiment bucket)

Minimal JSONL examples (one object per line):

{"request_id":"req-1","user_id":"u_1","ts":"2026-02-05T10:00:00Z","items":[{"item_id":"item_1","rank":1},{"item_id":"item_2","rank":2}],"context":{"tenant_id":"demo","surface":"home"}}
{"request_id":"req-1","user_id":"u_1","item_id":"item_2","event_type":"click","ts":"2026-02-05T10:00:02Z"}
{"experiment_id":"exp-1","variant":"A","request_id":"req-1","user_id":"u_1","ts":"2026-02-05T10:00:00Z","context":{"tenant_id":"demo","surface":"home"}}

Validation:

recsys-eval validate --schema exposure.v1 --input exposures.jsonl
recsys-eval validate --schema outcome.v1 --input outcomes.jsonl
recsys-eval validate --schema assignment.v1 --input assignments.jsonl

Tip: recsys-service can emit eval-compatible exposures directly. See “Service exposure logs vs eval schema” in Eval events.

Serving exposure events (service-native): what was actually served

This event shape is useful for auditability, debugging, and building derived datasets. It is not the same as the recsys-eval exposure schema (which is stricter and optimized for evaluation).

Canonical schema: Exposure schema (JSON)

Minimal example:

{
  "schema_version": "exposure.v1",
  "occurred_at": "2026-02-05T10:00:00Z",
  "tenant_id": "demo",
  "request_id": "00000000-0000-0000-0000-000000000000",
  "surface": "home",
  "segment": "default",
  "served": [{ "item_id": "item_1", "rank": 1, "score": 0.12 }]
}

Interaction events (pipelines): what happened in the product

This is the minimal “something happened” record used by pipelines.

Canonical schema: Interactions schema (JSON)

Minimal example:

{
  "schema_version": "interaction.v1",
  "occurred_at": "2026-02-05T10:00:02Z",
  "tenant_id": "demo",
  "event_type": "click",
  "item_id": "item_2"
}

If you need reliable evaluation joins, produce outcome.v1 for recsys-eval (it requires request_id and user_id).

Artifact manifest (pipelines → service): what version is live

In artifact/manifest mode, pipelines publish artifacts and update a manifest pointer. The service reads the current manifest and fetches referenced blobs.

Canonical schema: Manifest schema (JSON)

Minimal example:

{
  "schema_version": "manifest.v1",
  "tenant_id": "demo",
  "created_at": "2026-02-05T10:05:00Z",
  "version": "2026-02-05T10:05:00Z",
  "artifacts": {}
}

Versioning rules (practical)

  • Never change the meaning of an existing version.
  • Add a new version instead (for example: interaction.v2), and keep transforms explicit.
  • Treat schemas as strict by default.
  • recsys-eval validate uses JSON Schema with strictness that will reject missing required fields and unexpected keys.
  • Keep IDs stable and privacy-safe.
  • Use pseudonymous user IDs; do not log raw PII.