Runbook: Roll back config/rules¶
This runbook covers rolling back tenant config and rules in recsys-service.
If you need to roll back the artifact manifest (artifact mode), see: Roll back the manifest.
Symptoms¶
- A recent config/rules change caused empty recs, regressions, or elevated errors
- You need to revert a tenant to the last-known-good behavior quickly
What “version” means here¶
config_version and rules_version are ETags derived from the JSON payload (SHA-256 of the document). Rolling back to an earlier document means setting the current pointer back to that exact JSON payload.
Preferred rollback path (audited, API-only)¶
This path is safe and leaves an audit trail.
1) Capture the current state¶
GET /v1/admin/tenants/{tenant_id}/config→ recordconfig_version(andETagheader)GET /v1/admin/tenants/{tenant_id}/rules→ recordrules_version(andETagheader)
Also capture a failing POST /v1/recommend response meta for the incident record.
2) Find the last-known-good payload (Audit Log)¶
Use:
GET /v1/admin/tenants/{tenant_id}/audit?limit=50
Look for:
action=config.update(config changes)action=rules.update(rules changes)
For the entry you want to revert, copy the JSON from:
before_state(the previous payload)
3) Re-apply the previous payload¶
PUT the previous payload back:
PUT /v1/admin/tenants/{tenant_id}/configwith body =before_statePUT /v1/admin/tenants/{tenant_id}/ruleswith body =before_state
Use If-Match (recommended) to avoid overwriting a concurrent update:
If-Match: <current config_version>/If-Match: <current rules_version>
If you get 409 RECSYS_VERSION_MISMATCH, re-run step (1) and try again.
4) Invalidate caches¶
POST /v1/admin/tenants/{tenant_id}/cache/invalidatewith:
{ "targets": ["config", "rules", "popularity"] }
Notes:
popularityis relevant in artifact/manifest mode (no-op in DB-only mode).
5) Verify¶
1) Call POST /v1/recommend and confirm meta.config_version and meta.rules_version match the expected rolled-back versions. 2) Watch key metrics (error rate, latency, empty-recs rate) for recovery.
Break-glass rollback (direct DB pointer update)¶
Only use this if the admin API is unavailable and you have DBA-level access.
Warning
Direct DB changes can cause outages. Prefer the audited API rollback path above.
High-level steps:
1) Identify the tenant UUID:
select id from tenants where external_id = 'demo';
2) Pick the target version IDs (for example, the most recent two versions):
select id, etag, created_at
from tenant_config_versions
where tenant_id = :tenant_uuid
order by created_at desc
limit 5;
3) Update the current pointer in a transaction:
begin;
update tenant_configs_current
set config_version_id = :target_config_version_id,
updated_by_sub = 'break-glass',
updated_at = now()
where tenant_id = :tenant_uuid;
commit;
Repeat the same pattern for tenant_rules_current.
Read next¶
- Production readiness checklist: Production readiness checklist (RecSys suite)
- Empty recs incident: Runbook: Empty recs
- Operations index: Operations