Skip to content

How it works: architecture and data flow

This page explains How it works: architecture and data flow and how it fits into the RecSys suite.

Who this is for

  • Stakeholders and engineers who want a clear mental model of what runs where, and how auditability is produced.

What you will get

  • which components exist in the suite
  • how a request turns into a recommendation
  • how audit artifacts (logs → reports → decisions) are produced
  • how ship/rollback works at a high level

Key terms

  • Tenant: a configuration + data isolation boundary.
  • Surface: where recommendations are shown (home, PDP, cart, ...).
  • Artifact: an immutable, versioned blob produced offline and consumed online.
  • Manifest: maps artifact types to artifact URIs for a (tenant, surface) pair.
  • Exposure log: what was shown (audit trail + evaluation input).

One-screen mental model

RecSys separates concerns across four modules:

  • Serving (online, deterministic): recsys-service
  • Ranking logic (deterministic): recsys-algo
  • Computation (offline, versioned outputs): recsys-pipelines
  • Evaluation (analysis + decisions): recsys-eval

See also:

Online request flow (serve)

  1. Your app calls POST /v1/recommend (see API Reference).
  2. recsys-service builds a candidate set and ranks deterministically (see Candidate generation vs ranking).
  3. Optional trace/explain data can be enabled depending on your ranking setup (see Concepts).

Scoring details: Scoring model specification (recsys-algo)

Determinism and auditability contract

RecSys is designed so you can answer two questions reliably:

  • What did we serve (and what changed)?
  • Can we reproduce and evaluate decisions from logs later?

What “deterministic” means here (operational definition)

For a given tenant, POST /v1/recommend is deterministic when the following inputs are the same:

  • request payload (including surface, identifiers, and any per-request overrides)
  • recsys-service / recsys-algo version
  • serving inputs (DB rows in DB-only mode, or the same artifacts + manifest in artifact mode)
  • tenant config and rules

You can verify you are comparing like-for-like by recording:

  • meta.request_id (or the X-Request-Id you provided)
  • meta.config_version and meta.rules_version (ETags)

See: API Reference

What can (and should) make results change

  • Updated signals/artifacts (new day/window, refreshed pipelines outputs, changed catalog).
  • Config/rules updates (versions change).
  • Ship/rollback by switching the current manifest pointer (artifact mode).
  • Experiment assignment (if enabled).

What this does not guarantee

Known limitations and non-goals live here: Known limitations and non-goals (current)

Logging flow (audit trail)

RecSys produces an audit trail by linking:

  • Exposure logs (what the user saw)
  • Outcome logs (what the user did)

The join key is request_id (plus stable identifiers). See: Exposure logging and attribution

Evaluation flow (decide ship/hold/rollback)

  1. Validate logs and compute join-rate.
  2. Produce an offline/online report (see How-to: run evaluation and make ship decisions).
  3. Use the report to decide: ship / hold / rollback.

Deep dives live under:

Data modes (where features come from)

RecSys supports two primary serving modes:

  • DB-only mode: simplest way to start; fewer moving parts.
  • Artifact/manifest mode: versioned artifacts + a “current manifest pointer”.

See: Data modes: DB-only vs artifact/manifest

Ship/rollback mechanics (why it’s safe)

  • Config/rules changes are explicit and auditable (admin API).
  • Artifact mode allows versioned rollback by switching the current manifest pointer.
  • The suite is designed so rollbacks are operationally predictable.

See: