Workflow: Online A/B analysis in production¶
This page explains Workflow: Online A/B analysis in production and how it fits into the RecSys suite.
Who this is for¶
- Product + analytics teams running experiments on key surfaces
- Engineers who need a repeatable “measure → decide → ship/rollback” workflow
Goal¶
Measure impact from live traffic and decide ship / hold / rollback using experiment analysis.
Prerequisites (must be true)¶
- You can log:
- exposures (what was shown)
- outcomes (what the user did)
- assignments (experiment id + variant)
- Your join keys are stable (typically
request_id).
Start here if anything is unclear:
- Integration logging plan: Integration: how to produce the inputs
- Data contracts: Data contracts: what inputs look like
Workflow steps¶
- Pick a primary KPI and 2–4 guardrails (latency, empty-recs rate, error rate, etc.).
- Run
recsys-evalinexperimentmode for a well-defined window. - Interpret results:
- join-rate sanity
- SRM (sample ratio mismatch) warnings
- guardrails holding
- Decide ship/hold/rollback and save the report as an audit artifact.
Example command (experiment analysis)¶
recsys-eval run \
--mode experiment \
--dataset configs/examples/dataset.jsonl.yaml \
--config configs/eval/experiment.default.yaml \
--output /tmp/experiment_report.md \
--output-format markdown
Read next¶
- Interpretation cheat sheet: Interpretation cheat sheet (recsys-eval)
- Interpreting results (deep dive): Interpreting results: how to go from report to decision
- Metrics: Metrics: what we measure and why
- Troubleshooting (joins, SRM, anomalies): Troubleshooting: symptom -> cause -> fix