Data contracts: what inputs look like¶
This page explains Data contracts: what inputs look like and how it fits into the RecSys suite.
Who this is for¶
Integrators and anyone who needs to produce valid input logs.
What you will get¶
- The minimum required fields for each input type
- How the joins work
- Small example records
recsys-eval uses JSON Schemas for validation:
- schemas/exposure.v1.json
- schemas/outcome.v1.json
- schemas/assignment.v1.json
- api/schemas/report.v1.json
- api/schemas/decision.v1.json
Use the validate command before doing anything else:
recsys-eval validate --schema exposure.v1 --input exposures.jsonl
recsys-eval validate --schema outcome.v1 --input outcomes.jsonl
recsys-eval validate --schema assignment.v1 --input assignments.jsonl
Record formats¶
Exposure (what was shown)¶
Purpose:
- describes what items were recommended and in what order
- provides context for segment slicing
- acts as the "left side" of joins
Join key:
- request_id (required)
Minimal example (illustrative, not exhaustive):
{
"request_id": "req_123",
"tenant": "demo",
"surface": "home",
"user_id": "u_42",
"timestamp": "2026-01-27T12:00:00Z",
"items": [
{"item_id": "A", "rank": 1},
{"item_id": "B", "rank": 2}
]
}
Notes:
- For OPE, exposures may also include propensities. See docs/OPE.md.
Outcome (what the user did)¶
Purpose:
- records the behavior you care about: click, conversion, etc.
Join key:
- request_id (required)
Minimal example:
{
"request_id": "req_123",
"event": "click",
"item_id": "B",
"timestamp": "2026-01-27T12:00:05Z"
}
Assignment (experiment bucket)¶
Purpose:
- tells which variant a request/user belongs to (control vs candidate)
Join key:
- request_id (required in this dataset contract)
Minimal example:
{
"request_id": "req_123",
"experiment_id": "exp_home_rank_v3",
"variant": "control"
}
Interleaving datasets¶
Interleaving mode uses a different dataset config:
- ranker_a results
- ranker_b results
- outcomes (often clicks)
See configs/examples/dataset.interleaving.jsonl.yaml for the wiring.
Join expectations and quality signals¶
Good joins are boring. Bad joins destroy trust.
In reports, look for:
- match rate: how many exposures have outcomes
- duplicate request_id rates
- timestamp anomalies
- missing tenant/surface fields (kills segmentation)
If joins look wrong, stop and fix instrumentation. Do not "tune metrics".
Read next¶
- Integration logging plan: Integration: how to produce the inputs
- Workflow: Offline gate in CI: Workflow: Offline gate in CI
- Suite-level contract index: Data contracts
- Troubleshooting joins: Troubleshooting: symptom -> cause -> fix