Data contracts¶
This page is the canonical reference for Data contracts.
Who this is for¶
- Developers and data engineers implementing logging, pipelines, and data validation
- Analysts and recommendation engineers running
recsys-eval - Operators who need to reason about “what was served” vs “what was clicked” vs “what artifact version is live”
What you will get¶
- The contract types used across the suite (serving, evaluation, pipelines)
- Minimal examples you can copy/paste
- Where the canonical schemas live and how they are versioned
Overview: three contract families¶
- Evaluation events (for
recsys-eval) - Purpose: measure quality (offline regression, experiments).
- Join key:
request_id(exposures ↔ outcomes ↔ assignments). - Details + examples:
Eval events - Join semantics:
Event join logic -
Minimum instrumentation spec:
Minimum instrumentation -
Serving logs (what the service emitted)
- Purpose: auditable “what was served” record.
-
Canonical schema: Exposure schema (JSON)
-
Pipelines + artifacts (what pipelines consume/publish)
- Purpose: convert interactions into versioned artifacts and a manifest pointer.
- Interaction schema: Interactions schema (JSON)
- Manifest schema: Manifest schema (JSON)
Evaluation events (recsys-eval): what you must be able to produce¶
If your goal is “measure lift” or “decide what to ship”, implement these:
exposure.v1(what you showed)outcome.v1(what the user did)assignment.v1(optional; experiment bucket)
Minimal JSONL examples (one object per line):
{"request_id":"req-1","user_id":"u_1","ts":"2026-02-05T10:00:00Z","items":[{"item_id":"item_1","rank":1},{"item_id":"item_2","rank":2}],"context":{"tenant_id":"demo","surface":"home"}}
{"request_id":"req-1","user_id":"u_1","item_id":"item_2","event_type":"click","ts":"2026-02-05T10:00:02Z"}
{"experiment_id":"exp-1","variant":"A","request_id":"req-1","user_id":"u_1","ts":"2026-02-05T10:00:00Z","context":{"tenant_id":"demo","surface":"home"}}
Validation:
recsys-eval validate --schema exposure.v1 --input exposures.jsonl
recsys-eval validate --schema outcome.v1 --input outcomes.jsonl
recsys-eval validate --schema assignment.v1 --input assignments.jsonl
Tip: recsys-service can emit eval-compatible exposures directly. See “Service exposure logs vs eval schema” in Eval events.
Serving exposure events (service-native): what was actually served¶
This event shape is useful for auditability, debugging, and building derived datasets. It is not the same as the recsys-eval exposure schema (which is stricter and optimized for evaluation).
Canonical schema: Exposure schema (JSON)
Minimal example:
{
"schema_version": "exposure.v1",
"occurred_at": "2026-02-05T10:00:00Z",
"tenant_id": "demo",
"request_id": "00000000-0000-0000-0000-000000000000",
"surface": "home",
"segment": "default",
"served": [{ "item_id": "item_1", "rank": 1, "score": 0.12 }]
}
Interaction events (pipelines): what happened in the product¶
This is the minimal “something happened” record used by pipelines.
Canonical schema: Interactions schema (JSON)
Minimal example:
{
"schema_version": "interaction.v1",
"occurred_at": "2026-02-05T10:00:02Z",
"tenant_id": "demo",
"event_type": "click",
"item_id": "item_2"
}
If you need reliable evaluation joins, produce outcome.v1 for recsys-eval (it requires request_id and user_id).
Artifact manifest (pipelines → service): what version is live¶
In artifact/manifest mode, pipelines publish artifacts and update a manifest pointer. The service reads the current manifest and fetches referenced blobs.
Canonical schema: Manifest schema (JSON)
Minimal example:
{
"schema_version": "manifest.v1",
"tenant_id": "demo",
"created_at": "2026-02-05T10:05:00Z",
"version": "2026-02-05T10:05:00Z",
"artifacts": {}
}
Versioning rules (practical)¶
- Never change the meaning of an existing version.
- Add a new version instead (for example:
interaction.v2), and keep transforms explicit. - Treat schemas as strict by default.
recsys-eval validateuses JSON Schema with strictness that will reject missing required fields and unexpected keys.- Keep IDs stable and privacy-safe.
- Use pseudonymous user IDs; do not log raw PII.
Read next¶
- Exposure logging and attribution: Exposure logging and attribution
- How shipping/rollback ties to contracts: Suite architecture
- Data modes (DB-only vs artifact/manifest): Data modes: DB-only vs artifact/manifest