How it works: architecture and data flow¶
This page explains How it works: architecture and data flow and how it fits into the RecSys suite.
Who this is for¶
- Stakeholders and engineers who want a clear mental model of what runs where, and how auditability is produced.
What you will get¶
- which components exist in the suite
- how a request turns into a recommendation
- how audit artifacts (logs → reports → decisions) are produced
- how ship/rollback works at a high level
Key terms
- Tenant: a configuration + data isolation boundary.
- Surface: where recommendations are shown (home, PDP, cart, ...).
- Artifact: an immutable, versioned blob produced offline and consumed online.
- Manifest: maps artifact types to artifact URIs for a
(tenant, surface)pair. - Exposure log: what was shown (audit trail + evaluation input).
One-screen mental model¶
RecSys separates concerns across four modules:
- Serving (online, deterministic):
recsys-service - Ranking logic (deterministic):
recsys-algo - Computation (offline, versioned outputs):
recsys-pipelines - Evaluation (analysis + decisions):
recsys-eval
See also:
- Suite architecture: Suite architecture
- Suite context diagram: Suite Context
Online request flow (serve)¶
- Your app calls
POST /v1/recommend(see API Reference). recsys-servicebuilds a candidate set and ranks deterministically (see Candidate generation vs ranking).- Optional trace/explain data can be enabled depending on your ranking setup (see Concepts).
Scoring details: Scoring model specification (recsys-algo)
Determinism and auditability contract¶
RecSys is designed so you can answer two questions reliably:
- What did we serve (and what changed)?
- Can we reproduce and evaluate decisions from logs later?
What “deterministic” means here (operational definition)¶
For a given tenant, POST /v1/recommend is deterministic when the following inputs are the same:
- request payload (including
surface, identifiers, and any per-request overrides) recsys-service/recsys-algoversion- serving inputs (DB rows in DB-only mode, or the same artifacts + manifest in artifact mode)
- tenant config and rules
You can verify you are comparing like-for-like by recording:
meta.request_id(or theX-Request-Idyou provided)meta.config_versionandmeta.rules_version(ETags)
See: API Reference
What can (and should) make results change¶
- Updated signals/artifacts (new day/window, refreshed pipelines outputs, changed catalog).
- Config/rules updates (versions change).
- Ship/rollback by switching the current manifest pointer (artifact mode).
- Experiment assignment (if enabled).
What this does not guarantee¶
- KPI lift (depends on your data and experimentation discipline).
- Production readiness (use Production readiness checklist (RecSys suite)).
Known limitations and non-goals live here: Known limitations and non-goals (current)
Logging flow (audit trail)¶
RecSys produces an audit trail by linking:
- Exposure logs (what the user saw)
- Outcome logs (what the user did)
The join key is request_id (plus stable identifiers). See: Exposure logging and attribution
Evaluation flow (decide ship/hold/rollback)¶
- Validate logs and compute join-rate.
- Produce an offline/online report (see How-to: run evaluation and make ship decisions).
- Use the report to decide: ship / hold / rollback.
Deep dives live under:
recsys-eval: recsys-eval docs
Data modes (where features come from)¶
RecSys supports two primary serving modes:
- DB-only mode: simplest way to start; fewer moving parts.
- Artifact/manifest mode: versioned artifacts + a “current manifest pointer”.
See: Data modes: DB-only vs artifact/manifest
Ship/rollback mechanics (why it’s safe)¶
- Config/rules changes are explicit and auditable (admin API).
- Artifact mode allows versioned rollback by switching the current manifest pointer.
- The suite is designed so rollbacks are operationally predictable.
See:
- Operational reliability & rollback: Operational reliability and rollback
- Production-like tutorial: production-like run (pipelines → object store → ship/rollback)
Read next¶
- Quickstart (minimal): Tutorial: Quickstart (minimal)
- Quickstart (full validation): Tutorial: Quickstart (full validation)
- Pilot plan: Pilot plan (2–6 weeks)
- Capability matrix (scope and non-scope): Capability matrix (scope and non-scope)
- Known limitations: Known limitations and non-goals (current)