How it works: architecture and data flow¶

This page explains How it works: architecture and data flow and how it fits into the RecSys suite.

Who this is for¶

Stakeholders and engineers who want a clear mental model of what runs where, and how auditability is produced.

What you will get¶

which components exist in the suite
how a request turns into a recommendation
how audit artifacts (logs → reports → decisions) are produced
how ship/rollback works at a high level

Key terms

Tenant: a configuration + data isolation boundary.
Surface: where recommendations are shown (home, PDP, cart, ...).
Artifact: an immutable, versioned blob produced offline and consumed online.
Manifest: maps artifact types to artifact URIs for a (tenant, surface) pair.
Exposure log: what was shown (audit trail + evaluation input).

One-screen mental model¶

RecSys separates concerns across four modules:

Serving (online, deterministic): recsys-service
Ranking logic (deterministic): recsys-algo
Computation (offline, versioned outputs): recsys-pipelines
Evaluation (analysis + decisions): recsys-eval

Online request flow (serve)¶

Your app calls POST /v1/recommend (see API Reference).
recsys-service builds a candidate set and ranks deterministically (see Candidate generation vs ranking).
Optional trace/explain data can be enabled depending on your ranking setup (see Concepts).

Scoring details: Scoring model specification (recsys-algo)

Determinism and auditability contract¶

RecSys is designed so you can answer two questions reliably:

What did we serve (and what changed)?
Can we reproduce and evaluate decisions from logs later?

What “deterministic” means here (operational definition)¶

For a given tenant, POST /v1/recommend is deterministic when the following inputs are the same:

request payload (including surface, identifiers, and any per-request overrides)
recsys-service / recsys-algo version
serving inputs (DB rows in DB-only mode, or the same artifacts + manifest in artifact mode)
tenant config and rules

You can verify you are comparing like-for-like by recording:

meta.request_id (or the X-Request-Id you provided)
meta.config_version and meta.rules_version (ETags)

See: API Reference

What can (and should) make results change¶

Updated signals/artifacts (new day/window, refreshed pipelines outputs, changed catalog).
Config/rules updates (versions change).
Ship/rollback by switching the current manifest pointer (artifact mode).
Experiment assignment (if enabled).

What this does not guarantee¶

KPI lift (depends on your data and experimentation discipline).
Production readiness (use Production readiness checklist (RecSys suite)).

Known limitations and non-goals live here: Known limitations and non-goals (current)

Logging flow (audit trail)¶

RecSys produces an audit trail by linking:

Exposure logs (what the user saw)
Outcome logs (what the user did)

The join key is request_id (plus stable identifiers). See: Exposure logging and attribution

Evaluation flow (decide ship/hold/rollback)¶

Validate logs and compute join-rate.
Produce an offline/online report (see How-to: run evaluation and make ship decisions).
Use the report to decide: ship / hold / rollback.

Deep dives live under:

recsys-eval: recsys-eval docs

Data modes (where features come from)¶

RecSys supports two primary serving modes:

DB-only mode: simplest way to start; fewer moving parts.
Artifact/manifest mode: versioned artifacts + a “current manifest pointer”.

See: Data modes: DB-only vs artifact/manifest

Ship/rollback mechanics (why it’s safe)¶

Config/rules changes are explicit and auditable (admin API).
Artifact mode allows versioned rollback by switching the current manifest pointer.
The suite is designed so rollbacks are operationally predictable.

See:

Operational reliability & rollback: Operational reliability and rollback
Production-like tutorial: production-like run (pipelines → object store → ship/rollback)