Start here¶

This page explains Start here and how it fits into the RecSys suite.

Who this is for¶

Developers / platform engineers evaluating the offline layer
Data engineers operating daily runs and backfills
SRE / on-call responding to freshness and pipeline failures
Recommendation engineers who need to understand “what artifacts exist and when they update”

A clear mental model of what recsys-pipelines does and where it fits in the RecSys suite
The fastest paths to: run locally, operate daily, backfill, and roll back
Pointers to the canonical output layout, config, and on-call runbooks

Run locally
10–20 min: ingest → validate → compute → publish using local config.
Operate daily
What to run, what to watch, and which runbook to open first.
Backfill safely
Window selection, guardrails, and verification.
Roll back safely
Manifest rollback, safety checks, and verification.
Output layout
Where artifacts/manifests live and what “current” means.
Config reference
The knobs that change behavior (sources, windows, guardrails, sinks).
SLOs & freshness
Operational invariants and “is this stale?” reasoning.
Runbooks
Common failures and safe remediation patterns.

Think of recsys-pipelines as a factory:

The output artifacts are meant to be consumed by an online recommender service.

recsys-pipelines builds deterministic, version-addressed artifacts from raw exposure events.

Current v1 artifact types:

Key production properties:

duplicate events.

This repo is designed to be useful for:

Typical stack (simplified):

If you are here because something broke, jump to: