Runbook: Stale Artifact Manifest¶
Use this when artifact mode is enabled and recommendations do not reflect recently published pipeline outputs.
Fast triage¶
Set the affected tenant and surface:
TENANT_ID=${TENANT_ID:-demo}
SURFACE=${SURFACE:-home}
BASE_URL=${BASE_URL:-http://localhost:8000}
Confirm artifact mode is intended for the environment:
rg "RECSYS_ARTIFACT_" api/.env api/.env.example
The service reads the current manifest from RECSYS_ARTIFACT_MANIFEST_TEMPLATE. The local/default shape is:
s3://recsys-artifacts/registry/current/{tenant}/{surface}/manifest.json
Decision flow¶
- If artifact mode is off, this is a DB-only freshness issue. Check source data and skip this runbook.
- If the current manifest is missing or unreadable, fix object-store path, credentials, DNS, or TLS.
- If the manifest is old, check whether
recsys-pipelinespublished successfully. - If the manifest is current but serving is stale, wait for manifest TTL expiry or invalidate service caches.
- If invalidation temporarily fixes the issue, revisit
RECSYS_ARTIFACT_MANIFEST_TTLand the publish workflow.
Local checks¶
For local filesystem proof-kit output, inspect the current manifest directly:
test -s tmp/commercial-proof-kit/pipelines/registry/current/demo/home/manifest.json
python3 -m json.tool tmp/commercial-proof-kit/pipelines/registry/current/demo/home/manifest.json
For local MinIO or S3-compatible deployments, fetch the path that matches your configured manifest template. Do not paste credentials into tickets or public issues.
Cache invalidation¶
When the manifest is known-good and the service is allowed to read it, invalidate popularity/artifact-related caches:
curl -fsS -X POST "$BASE_URL/v1/admin/tenants/$TENANT_ID/cache/invalidate" \
-H "Content-Type: application/json" \
-H "X-Org-Id: $TENANT_ID" \
-H "X-Dev-User-Id: local-dev" \
-H "X-Dev-Org-Id: $TENANT_ID" \
-d "{\"targets\":[\"popularity\"],\"surface\":\"$SURFACE\"}"
Verification¶
- Fetch the manifest and confirm
updated_ator current artifact URIs changed. - Call
POST /v1/recommendfor the affected tenant and surface. - Confirm response quality, warning rates, and empty recommendation rate recover.
- Record whether the recovery required cache invalidation, TTL expiry, or a new pipeline publish.