Troubleshooting for integrators¶
This guide shows how to troubleshooting for integrators in a reliable, repeatable way.
Who this is for¶
- Developers integrating
recsys-serviceinto an application. - On-call engineers debugging “empty recs” or unexpected behavior.
What you will get¶
- A symptom → cause → fix checklist
- Links to the canonical runbooks and reference specs
Before you debug: collect these facts¶
- Tenant ID (
X-Org-Id) and surface name (surface) - Request ID (
X-Request-Id) used for the call - Whether you are in DB-only or artifact/manifest mode
- Whether exposure logging is enabled and where logs are written
Reference:
- Tenant + auth model: Auth and tenancy reference
- Data modes: Data modes: DB-only vs artifact/manifest
Symptom checklist¶
Symptom: items is empty¶
Most common causes:
- No candidates exist for this tenant + surface
- Rules/config exclude everything
- Artifact/manifest points to missing or stale data (artifact mode)
Fix path:
- Verify tenant scope and surface spelling (must match your data namespaces)
See: Surface namespaces - Check the “Empty recs” runbook
See: Runbook: Empty recs - If artifact mode: check stale manifest runbook
See: Runbook: Stale manifest (artifact mode)
Symptom: 401 / 403 / tenant seems “wrong”¶
Most common causes:
- Missing tenant header/claim
- Dev headers used in an environment that requires auth
- Tenant ID mismatch between config/rules and request
Fix path:
- Reference: Auth and tenancy reference
- Admin endpoints (config/rules): Admin API + local bootstrap (recsys-service)
Symptom: results change unexpectedly between calls¶
Most common causes:
- Inputs are not actually identical (request ID, experiment metadata, exclude list)
- Non-deterministic candidate source ordering (ties without stable ordering)
- You switched data without realizing (artifact refresh / DB update)
Fix path:
- Determinism definition: How it works: architecture and data flow
- Verify determinism tutorial: Verify determinism
- Ranking determinism pitfalls: Ranking & constraints reference
Symptom: exposure logs are missing or not joinable¶
Most common causes:
- Exposure logging is disabled
- Log path is wrong or not persisted
- Missing/unstable request IDs or user/session IDs across platforms
Fix path:
- Minimum instrumentation spec: Minimum instrumentation spec (for credible evaluation)
- Join logic: Event join logic (exposures ↔ outcomes ↔ assignments)
- Verify joinability tutorial: Verify joinability (request IDs → outcomes)
Symptom: service never becomes healthy¶
Fix path:
- Runbook: Runbook: Service not ready
- If migrations fail: Runbook: Database migration issues
Read next¶
- Integration checklist: How-to: Integration checklist (one surface)
- API reference: API Reference
- Production readiness checklist: Production readiness checklist (RecSys suite)