rollback¶

This page explains Decision playbook: ship / hold / rollback and how it fits into the RecSys suite.

Who this is for¶

Use this in order. Don’t skip step 0: bad joins make all metrics untrustworthy.

If any of these fail: HOLD and fix logging before interpreting results.

Schemas validate (recsys-eval validate passes).
Join integrity is sane:
Example threshold: join-rate ≥ 95% for the slices you care about.
If join-rate is lower, the most common causes are: missing/unstable request_id, wrong surface/tenant keys, or dropped events.

If any guardrail breaches: ROLL BACK (or hold with an immediate mitigation plan).

Example starting points (tune to your product/SLOs):

If guardrails hold and data is valid:

SHIP when primary KPI improves beyond your minimum detectable effect and results are stable across key slices.
Example threshold: +1–3% relative on your primary KPI sustained for N days.
HOLD when results are inconclusive (underpowered, too noisy, conflicting slices).
Example threshold: KPI is within ±1% relative (or confidence interval includes 0).
ROLL BACK when primary KPI regresses meaningfully (even if some slices improved).
Example threshold: ≤ −1–2% relative on your primary KPI.

HOLD. Fix instrumentation before shipping. Otherwise you risk “shipping on broken data”.

Checklist:

Default action: ROLL BACK (or hold only if you can mitigate quickly with a safe change).

Next actions:

Default action: ROLL BACK.

If you suspect underpowering or a slice-specific mismatch, HOLD only long enough to:

HOLD the rollout. Fix the regression or update the baseline only if the change is intentional and reviewed.

Artifacts/manifest rollback (artifact mode): roll back the manifest pointer and invalidate caches.
Config/rules rollback (DB-only or control-plane changes): roll back config/rules versions and invalidate caches.