Default evaluation pack (recommended)¶

This page explains Default evaluation pack (recommended) and how it fits into the RecSys suite.

Who this is for¶

Teams running a RecSys pilot and needing a “good enough” default metric set.
Engineers and analysts who want to standardize ship/hold/rollback decisions across surfaces.

In week 1, your goal is not to win. It’s to make measurement trustworthy.

Example (validate inputs):

./bin/recsys-eval validate --schema exposure.v1 --input exposures.jsonl
./bin/recsys-eval validate --schema outcome.v1 --input outcomes.jsonl

See:

Pick 1–2 relevance proxies and 1–2 distribution metrics:

Start with k=5 or k=10 and keep it stable across runs.

Once logging and joins are trustworthy, prefer online experiments for KPI lift.

1 primary KPI (business-owned): CTR / conversion rate / revenue per exposure (pick one)
2–4 guardrails (must not regress):
empty-recs rate
error rate
latency (p95/p99)
join integrity (if join-rate drops, HOLD and fix logging)

Default slices to start with:

Add one more slice only if you will act on it (device, locale, segment).