Skip to content

recsys-eval configuration

This page is the canonical reference for recsys-eval configuration.

Who this is for

  • RecSys engineers running offline regression gates in CI
  • Developers validating instrumentation by producing a first report from logs

What you will get

  • The evaluation config schema (what --config expects)
  • The dataset wiring schema (what --dataset expects)
  • Copy/paste examples you can adapt

Reference

recsys-eval run takes two YAML files:

  • --dataset: where exposures/outcomes/assignments come from (sources and joins happen in the tool)
  • --config: what evaluation to run (mode, metrics, gates, guardrails)

Important:

  • YAML parsing is strict: unknown fields fail fast.
  • Output format is a CLI flag (--output-format), not a config field.

Dataset config (--dataset)

Top-level keys:

Key Required Meaning
exposures offline/experiment/ope Exposure source (what was shown).
outcomes offline/experiment/ope Outcome source (what the user did).
assignments experiment/aa-check Experiment assignment source (variant per request/user).
interleaving interleaving Special wiring for interleaving (ranker A/B lists + outcomes).

Source config:

Key Required Meaning
type yes jsonl, postgres, or duckdb.
path jsonl Path to a JSONL file.
dsn postgres/duckdb DB DSN.
query postgres/duckdb Query that returns JSON rows.

Evaluation config (--config)

Top-level keys:

Key Default Meaning
mode required offline, experiment, ope, interleaving, aa-check.
offline empty Offline regression metrics and gates (used in offline mode).
experiment empty Experiment analysis and guardrails (used in experiment and aa-check modes).
ope defaults set Off-policy evaluation settings (used in ope mode).
interleaving defaults set Interleaving algorithm settings (used in interleaving mode).
scale memory memory, stream, or duckdb mode for large datasets.
artifacts empty Optional artifact/manifest resolution metadata (for report context).

Defaults applied by the tool:

  • scale.mode: defaults to memory
  • ope.reward_event: defaults to click
  • ope.unit: defaults to request
  • ope.reward_aggregation: defaults to sum
  • ope.min_propensity: defaults to 1e-6
  • interleaving.algorithm: defaults to team_draft
  • interleaving.seed: defaults to 42

Offline mode requirements:

  • offline.metrics is required (at least one metric spec)

Examples

Minimal dataset config (JSONL)

exposures:
  type: jsonl
  path: /tmp/exposures.eval.jsonl
outcomes:
  type: jsonl
  path: /tmp/outcomes.eval.jsonl
assignments:
  type: jsonl
  path: /tmp/assignments.eval.jsonl

Minimal offline config (gate in CI)

mode: offline
offline:
  metrics:
    - name: precision
      k: 10
    - name: recall
      k: 10
  slice_keys: ["tenant", "surface"]
  gates:
    - metric: precision@10
      max_drop: 0.001

Minimal experiment config (guardrails + primary metrics)

mode: experiment
experiment:
  experiment_id: "exp_123"
  control_variant: "A"
  primary_metrics: ["ctr", "conversion_rate"]
  slice_keys: ["tenant", "surface"]
  guardrails:
    max_latency_p95_ms: 300
    max_error_rate: 0.01
    max_empty_rate: 0.02