Minimum instrumentation spec (for credible evaluation)¶
This page is the canonical reference for Minimum instrumentation spec (for credible evaluation).
Who this is for¶
- Developers implementing exposure/outcome logging for RecSys
- Data engineers preparing datasets for
recsys-eval - Teams that want “ship / hold / rollback” decisions to be auditable
What you will get¶
- The minimum events and fields required for trustworthy KPI + guardrail metrics
- The join key (
request_id) and the invariants you must enforce - Common pitfalls that cause low join-rate or misleading results
Core invariants (do not compromise)¶
request_idis unique per rendered list.- The same
request_idpropagates exposure → outcome → assignment (if experiments). user_idis stable and pseudonymous (do not log raw PII).surfaceandtenant_idare present (recommended incontext) and names are stable.- Timestamps are RFC3339 (
ts) and use a consistent clock source.
Events you must produce¶
The strict schemas and examples live here:
exposure.v1 (required)¶
Required fields:
request_iduser_idtsitems[]with{ item_id, rank }(rankis 1-based)
Strongly recommended context keys (string values):
tenant_idsurfacesegment(if you segment recommendations)
Optional (but useful guardrails):
latency_ms(p95/p99 guardrail and rollout safety)error(detect “served empty because of error”)
outcome.v1 (required)¶
Required fields:
request_iduser_iditem_idevent_type(clickorconversion)ts
Optional:
value(required if you want revenue-based metrics)
Mapping guidance:
- Map your product events into
clickandconversionconsistently. - Example: add-to-cart could be treated as
conversionfor a top-of-funnel pilot.
assignment.v1 (required for experiments)¶
Required fields:
experiment_idvariantrequest_iduser_idts
KPI specs (minimum)¶
CTR (click-through rate)¶
- Definition:
clickoutcomes / exposures - Join key:
request_id - Required fields:
- exposure:
request_id,user_id,items[].item_id,items[].rank - outcome:
request_id,user_id,item_id,event_type=click - Common pitfalls:
- outcomes missing
request_id(join-rate collapses) - clicks logged without
item_id(can’t attribute to rank)
Conversion rate¶
- Definition:
conversionoutcomes / exposures - Join key:
request_id - Required fields:
- exposure:
request_id,user_id,items[].item_id - outcome:
request_id,user_id,item_id,event_type=conversion - Common pitfalls:
- logging conversions without the originating recommendation
request_id - reusing a single
request_idfor multiple renders (attribution smears)
Revenue per exposure¶
- Definition:
sum(value)forconversionoutcomes / exposures - Join key:
request_id - Required fields:
- outcome:
event_type=conversion,value - Common pitfalls:
- missing/zero
value(metric becomes meaningless) - currency/unit mismatches (document the unit)
Offline ranking proxies (HitRate@K, NDCG@K, MAP@K, …)¶
- Definition: compare the ranked list (
exposure.v1.items[]) to outcomes as relevance signals - Join key:
request_id - Required fields:
- exposure:
items[].rank(1-based),items[].item_id - outcome:
item_id,event_type(what counts as “relevant”) - Common pitfalls:
- treating all outcome events as equally relevant (be explicit)
- running offline metrics on a dataset with low join-rate
Guardrails (minimum)¶
Join integrity (must pass before trusting any KPI)¶
- What you need:
request_idpresent in exposures + outcomes (+ assignments in experiments) - Join key:
request_id - Common pitfalls:
- generating
request_idtwice (one for the API call, another for logging) - not propagating
request_idto client-side outcome events
See: Event join logic (exposures ↔ outcomes ↔ assignments)
Empty-recs rate¶
- What you need:
exposure.v1.items[](may be empty) - Join key: none (computed from exposures)
- Common pitfalls:
- logging exposures for requests you never rendered
- treating “empty because error” as a normal exposure (use
errorwhen possible)
Latency and error rate¶
- What you need:
exposure.v1.latency_msandexposure.v1.error(optional fields supported by schema) - Join key: none (computed from exposures)
- Common pitfalls:
- measuring latency in the wrong place (client vs server) and mixing the two
- missing the long tail (track p95/p99, not only averages)
Verify (Definition of Done)¶
recsys-eval validatesucceeds for your event logs:
recsys-eval validate --schema exposure.v1 --input exposures.jsonl
recsys-eval validate --schema outcome.v1 --input outcomes.jsonl
recsys-eval validate --schema assignment.v1 --input assignments.jsonl # if you run experiments
recsys-eval validatepasses forexposure.v1andoutcome.v1(andassignment.v1if experiments).- Join-rate is measured by surface and is not near-zero.
- You can compute at least one KPI and one guardrail end-to-end.
Read next¶
- Data contracts hub: Data contracts
- Decision playbook (ship/hold/rollback): Decision playbook: ship / hold / rollback
- Integration spec (headers, request_id, invariants): Integration spec (one surface)
- Integration checklist: How-to: Integration checklist (one surface)