recsys-pipelines¶
Filesystem-first pipelines that build versioned recommendation artifacts from raw exposure events.
This repository is the offline factory of a recommender stack:
- It ingests raw exposure events (JSONL, Postgres, or S3 batch).
- It canonicalizes them into a deterministic, replayable dataset.
- It computes artifacts (v1: popularity, co-occurrence, implicit, content_sim, session_seq).
- It validates outputs and enforces hard resource limits.
- It publishes artifacts to a versioned object store and updates a
single "current" manifest pointer.
If you are new: start at docs/start-here.md.
Quickstart¶
Requirements:
- Go toolchain (see
go.mod)
Run the pipeline locally against the tiny sample dataset:
make test
make build
./bin/recsys-pipelines run \
--config configs/env/local.json \
--tenant demo \
--surface home \
--start 2026-01-01 \
--end 2026-01-01
Outputs (default .out/):
- Canonical events:
.out/canonical/<tenant>/<surface>/exposures/YYYY-MM-DD.jsonl - Staged artifacts:
.out/artifacts/<tenant>/<surface>/<segment>/<type>/<window>/... - Published blobs:
.out/objectstore/<tenant>/<surface>/<type>/<version>.json - Current manifest:
.out/registry/current/<tenant>/<surface>/manifest.json
Run the smoke test (includes an idempotency check):
make smoke
Documentation¶
Docs are organized into tutorials, how-to guides, explanations, and reference. See docs/index.md for the entry point.
- Start here:
docs/start-here.md - Tutorials:
docs/tutorials/ - How-to:
docs/how-to/ - Explanations:
docs/explanation/ - Reference:
docs/reference/ - Operations:
docs/operations/
Binaries¶
recsys-pipelines: one-shot runner (local/dev, or simple cron)job_ingest: ingest + canonicalize (job-per-container style)job_popularity: compute + stage popularity artifactjob_cooc: compute + stage co-occurrence artifactjob_implicit: compute + stage implicit (collaborative) artifactjob_content_sim: compute + stage content similarity artifactjob_session_seq: compute + stage session sequence artifactjob_validate: validate canonical event quality for a window rangejob_publish: publish staged artifacts + swap the current manifestjob_db_signals: write popularity + co-vis signals into Postgresjob_catalog: ingest item tags into Postgres
See: docs/tutorials/job-mode.md.
Contributing¶
See docs/contributing/dev-workflow.md.
Releases¶
Tag releases with the module prefix, e.g. recsys-pipelines/v0.2.0.
Read next¶
- Start here: Start here
- Quickstart (10 minutes): Quickstart (10 minutes)