Interpreting metrics and reports¶

This page gives a practical mental model for turning an evaluation report into a ship/hold/rollback decision.

Canonical reading order

This page is an orientation layer. The detailed metric definitions live in recsys-eval docs.

What a report is (and is not)¶

A RecSys evaluation report is:

It is not:

Start here:

Evaluation validity: Evaluation validity
Join logic: Join logic
Check guardrails first
Did any hard guardrail regress beyond tolerance?
If yes, decide "hold" even if the primary metric improves.
Read the primary metric in context
Compare relative deltas, not just absolute.
Look for segment-specific regressions (new users, cold start surfaces, long-tail items).
Identify tradeoffs and risks
Are you trading diversity for short-term clicks?
Are you increasing concentration on a few items?
Write the decision and follow-ups
Ship / hold / rollback
1–5 bullets explaining why
The next experiment or mitigation