Runbook: Service Not Ready¶
Use this when /readyz fails or an orchestrator keeps the API out of rotation.
Fast triage¶
Set the service URL:
BASE_URL=${BASE_URL:-http://localhost:8000}
Check liveness separately from readiness:
curl -fsS "$BASE_URL/healthz"
curl -fsS "$BASE_URL/readyz"
curl -fsS "$BASE_URL/health" || true
Interpretation:
| Result | Meaning |
|---|---|
/healthz fails | The process or basic liveness path is unhealthy. Start with container logs. |
/healthz passes but /readyz fails | The process is up, but a readiness dependency is unhealthy. |
/health fails | Overall health is unhealthy. Treat details as internal operational data. |
Current readiness dependencies¶
The service registers:
basicreadiness in all modes.databasereadiness when a database pool is configured.- Auth provider health checks when auth middleware registers them.
Local checks¶
Inspect logs and database migration state:
docker compose logs --tail=100 api
cd api && make migrate-status
Check common production-sensitive settings:
rg "AUTH_REQUIRED|JWT_|API_KEY_|DATABASE_URL|RECSYS_ARTIFACT_|PPROF" api/.env api/.env.example
Common causes¶
| Cause | Signal | Remediation |
|---|---|---|
| Database unreachable | /readyz fails after startup and logs mention database ping/connectivity. | Fix DATABASE_URL, network policy, DNS, credentials, or database availability. |
| Migrations missing | Service starts but database-backed routes or readiness fail after schema changes. | Run the migration workflow for the environment. |
| JWT/JWKS dependency unhealthy | Auth is enabled and logs mention JWKS or allowed-host failures. | Fix JWT_JWKS_URL, AUTH_JWKS_ALLOWED_HOSTS, egress, or TLS. |
| Unsafe production config | Service exits during config validation. | Fix missing salts/secrets, S3 SSL settings, or pprof binding as reported by logs. |
Verification¶
/healthzreturns success./readyzreturns success.- A known-good recommendation request succeeds.
- Logs do not contain repeated readiness or dependency errors after recovery.