Runbook: Stale manifest (artifact mode)¶
This guide shows how to runbook: Stale manifest (artifact mode) in a reliable, repeatable way.
Symptoms¶
- Recommendations look stale even after pipelines published new artifacts
- The “current manifest”
updated_atis older than expected for a tenant/surface POST /v1/admin/tenants/{tenant_id}/cache/invalidatewithpopularityfixes it temporarily
Decision tree (fast path)¶
flowchart TD
A[Serving stale artifacts] --> B{Artifact mode enabled?}
B -->|No| C[DB-only mode: check DB signal freshness]
B -->|Yes| D{Can you fetch the current manifest from object store?}
D -->|No| E[Fix object store access (DNS/egress/creds/TLS)]
D -->|Yes| F{Manifest updated recently?}
F -->|No| G[Pipeline publish failed or scheduler not running]
F -->|Yes| H{Service still stale after manifest TTL?}
H -->|No| I[Wait for TTL expiry or invalidate caches]
H -->|Yes| J[Invalidate caches; check service logs for fetch errors] Quick triage (copy/paste)¶
Set:
TENANT_ID=demo
SURFACE=home
BASE_URL=${BASE_URL:-http://localhost:8000}
-
Confirm the service is in artifact mode:
-
RECSYS_ARTIFACT_MODE_ENABLED=trueis required. -
If you’re not sure, check the service config for
RECSYS_ARTIFACT_*variables and object store settings. -
Fetch the current manifest from object storage.
The manifest location is defined by:
RECSYS_ARTIFACT_MANIFEST_TEMPLATE(default:s3://recsys-artifacts/registry/current/{tenant}/{surface}/manifest.json)
Example with AWS CLI:
aws s3 cp "s3://recsys-artifacts/registry/current/${TENANT_ID}/${SURFACE}/manifest.json" -
Local dev (MinIO via docker compose):
docker compose run --rm --entrypoint sh minio-init -c \
"mc alias set local http://minio:9000 minioadmin minioadmin >/dev/null && \
mc cat local/recsys-artifacts/registry/current/${TENANT_ID}/${SURFACE}/manifest.json | head"
- If the manifest is new but service output is still old, invalidate caches:
curl -fsS -X POST "$BASE_URL/v1/admin/tenants/${TENANT_ID}/cache/invalidate" \
-H "Content-Type: application/json" \
-H "X-Org-Id: $TENANT_ID" \
-d "{\"targets\":[\"popularity\"],\"surface\":\"${SURFACE}\"}"
Likely causes and safe remediations¶
- Manifest pointer not updated
- Check pipeline scheduler health and recent pipeline runs.
- See pipelines runbook: Runbook: Stale artifacts
- TTLs are too long for your workflow
- Tune
RECSYS_ARTIFACT_MANIFEST_TTLandRECSYS_ARTIFACT_CACHE_TTL. - Object store connectivity problems
- Validate endpoint/creds (
RECSYS_ARTIFACT_S3_*) from the service network.
Verification¶
- Fetch the current manifest again and confirm
updated_atadvanced. - Call
/v1/recommendtwice (after TTL expiry or cache invalidation) and confirm outputs reflect the new artifacts.
Read next¶
- Artifacts and manifest lifecycle: Artifacts and manifest lifecycle (pipelines → service)
- Data modes: Data modes: DB-only vs artifact/manifest
- Pipelines rollback guide: How-to: Roll back to a previous artifact version