recsys-eval configuration¶
This page is the canonical reference for recsys-eval configuration.
Who this is for¶
- RecSys engineers running offline regression gates in CI
- Developers validating instrumentation by producing a first report from logs
What you will get¶
- The evaluation config schema (what
--configexpects) - The dataset wiring schema (what
--datasetexpects) - Copy/paste examples you can adapt
Reference¶
recsys-eval run takes two YAML files:
--dataset: where exposures/outcomes/assignments come from (sources and joins happen in the tool)--config: what evaluation to run (mode, metrics, gates, guardrails)
Important:
- YAML parsing is strict: unknown fields fail fast.
- Output format is a CLI flag (
--output-format), not a config field.
Dataset config (--dataset)¶
Top-level keys:
| Key | Required | Meaning |
|---|---|---|
exposures | offline/experiment/ope | Exposure source (what was shown). |
outcomes | offline/experiment/ope | Outcome source (what the user did). |
assignments | experiment/aa-check | Experiment assignment source (variant per request/user). |
interleaving | interleaving | Special wiring for interleaving (ranker A/B lists + outcomes). |
Source config:
| Key | Required | Meaning |
|---|---|---|
type | yes | jsonl, postgres, or duckdb. |
path | jsonl | Path to a JSONL file. |
dsn | postgres/duckdb | DB DSN. |
query | postgres/duckdb | Query that returns JSON rows. |
Evaluation config (--config)¶
Top-level keys:
| Key | Default | Meaning |
|---|---|---|
mode | required | offline, experiment, ope, interleaving, aa-check. |
offline | empty | Offline regression metrics and gates (used in offline mode). |
experiment | empty | Experiment analysis and guardrails (used in experiment and aa-check modes). |
ope | defaults set | Off-policy evaluation settings (used in ope mode). |
interleaving | defaults set | Interleaving algorithm settings (used in interleaving mode). |
scale | memory | memory, stream, or duckdb mode for large datasets. |
artifacts | empty | Optional artifact/manifest resolution metadata (for report context). |
Defaults applied by the tool:
scale.mode: defaults tomemoryope.reward_event: defaults toclickope.unit: defaults torequestope.reward_aggregation: defaults tosumope.min_propensity: defaults to1e-6interleaving.algorithm: defaults toteam_draftinterleaving.seed: defaults to42
Offline mode requirements:
offline.metricsis required (at least one metric spec)
Examples¶
Minimal dataset config (JSONL)¶
exposures:
type: jsonl
path: /tmp/exposures.eval.jsonl
outcomes:
type: jsonl
path: /tmp/outcomes.eval.jsonl
assignments:
type: jsonl
path: /tmp/assignments.eval.jsonl
Minimal offline config (gate in CI)¶
mode: offline
offline:
metrics:
- name: precision
k: 10
- name: recall
k: 10
slice_keys: ["tenant", "surface"]
gates:
- metric: precision@10
max_drop: 0.001
Minimal experiment config (guardrails + primary metrics)¶
mode: experiment
experiment:
experiment_id: "exp_123"
control_variant: "A"
primary_metrics: ["ctr", "conversion_rate"]
slice_keys: ["tenant", "surface"]
guardrails:
max_latency_p95_ms: 300
max_error_rate: 0.01
max_empty_rate: 0.02
Read next¶
- CLI usage and exit codes: CLI: recsys-eval
- How-to run eval and ship decisions: How-to: run evaluation and make ship decisions
- Default evaluation pack: Default evaluation pack (recommended)