How-to: Backfill pipelines safely¶
This guide shows how to how-to: Backfill pipelines safely in a reliable, repeatable way.
Who this is for¶
- Data engineers running historical reprocessing
- SRE / on-call handling late data, broken windows, or schema changes
Goal¶
Recompute historical windows without breaking “current” artifacts, while staying within guardrails.
Quick paths¶
- Run a backfill: How-to: Run a backfill safely
- Windows and backfills (concepts): Windows and backfills
- Validation and guardrails: Validation and guardrails
- Output layout (verify results): Output layout (local filesystem)
Checklist (safe default)¶
-
Define the backfill window and why you need it
-
Start small (1–3 days) to validate assumptions.
-
Run the backfill
-
Follow the canonical command patterns: How-to: Run a backfill safely
-
Verify before publishing “current”
-
Inspect output locations and manifest pointers: Output layout (local filesystem)
-
Watch guardrails and resource limits
-
Validation failures are designed to stop bad publishes: Validation and guardrails
Read next¶
- Roll back safely: How-to: Roll back artifacts safely
- Validation failed runbook: Runbook: Validation failed
- Limit exceeded runbook: Runbook: Limit exceeded