Skip to content

How-to: Backfill pipelines safely

This guide shows how to how-to: Backfill pipelines safely in a reliable, repeatable way.

Who this is for

  • Data engineers running historical reprocessing
  • SRE / on-call handling late data, broken windows, or schema changes

Goal

Recompute historical windows without breaking “current” artifacts, while staying within guardrails.

Quick paths

Checklist (safe default)

  1. Define the backfill window and why you need it

  2. Start small (1–3 days) to validate assumptions.

  3. Run the backfill

  4. Follow the canonical command patterns: How-to: Run a backfill safely

  5. Verify before publishing “current”

  6. Inspect output locations and manifest pointers: Output layout (local filesystem)

  7. Watch guardrails and resource limits

  8. Validation failures are designed to stop bad publishes: Validation and guardrails