How-to: Debug a failed pipeline run¶
This guide shows how to how-to: Debug a failed pipeline run in a reliable, repeatable way.
1) Identify the step¶
Look at logs for one of:
- ingest
- validate
- popularity
- cooc
- publish
2) Common root causes¶
- Input files missing or wrong path
- Bad JSON in raw event files
- Validation fails (out-of-window timestamps, too many events)
- Resource limit exceeded (sessions/items)
- Disk permission errors
3) Useful commands¶
Re-run one day:
./bin/recsys-pipelines run --config configs/env/local.json --tenant demo \
--surface home --start 2026-01-01 --end 2026-01-01
Check manifest:
cat .out/registry/current/demo/home/manifest.json
Inspect canonical files:
find .out/canonical -type f | sort
4) If publish failed¶
Publishing is ordered so that the manifest pointer updates last. This means serving should still point to the previous version.
See operations/runbooks/pipeline-failed.md.
Read next¶
- Operate pipelines daily: How-to: Operate pipelines daily
- Pipeline failed runbook: Runbook: Pipeline failed
- Validation failed runbook: Runbook: Validation failed
- Limit exceeded runbook: Runbook: Limit exceeded
- Output layout (verify “current”): Output layout (local filesystem)