How-to: add a new signal end-to-end¶
This guide shows how to how-to: add a new signal end-to-end in a reliable, repeatable way.
Who this is for¶
- Engineers extending RecSys with a new signal that affects ranking
- Teams moving from “we can run it” to “we can improve it safely”
What you will get¶
- A concrete, end-to-end checklist:
port→pipeline artifact→manifest→serving→eval - The exact places in the codebase where new signals are wired today
- Verification points and safe rollback options
Scope
This guide focuses on artifact/manifest mode, because it is the safest way to ship/rollback signal changes. You can implement signals in DB-only mode too, but the “publish pointer last” safety story is weaker.
Prereqs¶
- You can run a production-like setup locally: production-like run (pipelines → object store → ship/rollback)
- You understand the current ranking pipeline order: Ranking & constraints reference
- You have the minimum instrumentation in place for evaluation: Minimum instrumentation spec (for credible evaluation)
Step 0: classify your signal (so you change the right layer)¶
Most signals fit one of these shapes:
- Candidate retrieval signal: changes which items are considered.
- Example: “category trending” adds candidates to the pool.
- Scoring signal: changes how candidates are ranked.
- Example: “user-to-item affinity” contributes a score term.
- Post-ranking constraint: changes ordering or membership after scoring.
- Example: brand/category caps, diversity (MMR), blocklists.
In recsys-algo, scoring currently has:
- a baseline term (
pop_rawfrom the primary candidate source) - optional co-visitation
- a similarity bucket (
embedding/collaborative/content/session) - an optional personalization multiplier
See the formal spec: Scoring model specification (recsys-algo)
Step 1: define (or reuse) the store port in recsys-algo¶
If your signal is already represented by an existing port, reuse it. Otherwise:
- Add a new interface to
recsys-algo/model/model.go. - Extend the
algorithmlayer to call it: - candidate retrieval:
recsys-algo/algorithm/candidate_sources.go - scoring:
recsys-algo/algorithm/scoring.go(and possiblyselectSimilarity) - request-time gating/weights:
recsys-algo/algorithm/signals.go - Add unit tests in
recsys-algo/algorithm/*_test.goto lock determinism (stable ordering on ties).
Tip: when a capability is not available, return model.ErrFeatureUnavailable and keep serving using other signals.
See: Store ports
Step 2: add an artifact type + compute pipeline (recsys-pipelines)¶
If the signal is computed offline, add it as a new artifact type.
Follow the module guide: How-to: Add a new artifact type
In practice, you will touch (at least) these areas:
- Domain type and model:
recsys-pipelines/internal/domain/artifacts/types.go(newartifacts.Type)recsys-pipelines/internal/domain/artifacts/models.go(v1 struct + constructor)- Compute:
recsys-pipelines/internal/app/usecase/compute_<signal>.go- Validation:
recsys-pipelines/internal/adapters/validator/builtin/validator.go- Publishing and manifest update:
recsys-pipelines/internal/app/usecase/publish_artifacts.gorecsys-pipelines/internal/app/workflow/pipeline.go
Non-negotiables:
- Deterministic output for the same canonical inputs
- Bounded runtime and memory use
- Manifest pointer updated last (two-phase publish)
Background: Artifacts and versioning
Step 3: publish and verify the manifest contains your artifact¶
Run pipelines and confirm the manifest key is present.
- Run a local pipelines job (example):
recsys-pipelines run --config pipelines.config.json
- Verify the output layout and manifest pointer:
find /tmp/recsys-pipelines -maxdepth 5 -type f | head
The current manifest is the contract between pipelines and the service. See:
- Output layout: Output layout (local filesystem)
- Manifest schema: Manifest schema (JSON)
Step 4: make recsys-service consume the artifact¶
In artifact mode, the service reads manifest.current and loads known artifact types.
For a new artifact-backed signal, you will typically update:
api/internal/artifacts/types.go(add the manifest key constant and the artifact struct + validator)api/internal/store/artifact.go(load the artifact URI and implement the relevantrecsys-algo/modelport)
Definition of Done for this step:
- Service starts even if your new artifact key is missing (treat as “signal unavailable”).
- When the artifact is present and valid, the port returns results deterministically.
Step 5: expose a safe serving toggle (config)¶
Do not “silently” enable a new signal everywhere.
Pick one (or both):
- Weight-gate: introduce a new blend term or reuse an existing one and default it to
0. - Mode-gate: add a new algorithm mode (advanced; increases surface area).
Ensure the toggle is:
- documented in
reference/config/recsys-service.md - visible in traces/explain output (so you can prove what was active)
Step 6: verify serving behavior¶
- Start the service in artifact mode and point it at a manifest containing the new artifact.
- Request recommendations with explain enabled (example):
curl -sS http://localhost:8000/v1/recommend \\
-H 'Content-Type: application/json' \\
-H 'X-RecSys-Tenant: demo' \\
-d '{\"user_id\":\"u_1\",\"surface\":\"home\",\"k\":10,\"explain\":\"full\"}' | jq .
Verify:
- the response is non-empty
- the
explainblock includes your signal’s contribution (or a clear “unavailable” state) - tie-breaking is stable (repeat the same request and compare ordering)
Step 7: evaluate and ship safely¶
- Generate an evaluation dataset (or use your pilot logs).
- Run an offline comparison and guardrail checks:
recsys-eval run --dataset dataset.yml --config eval.yml
-
Use the report to decide ship/hold/rollback:
-
Suite workflow: How-to: run evaluation and make ship decisions
- Decision playbook: Decision playbook: ship / hold / rollback
Rollback options (use at least one)¶
- Manifest rollback (artifact mode): swap the manifest pointer back to a previous version.
- Config rollback: set the new signal weight back to
0or disable the mode. - Rules rollback: revert pinned/boost/block rules (if your change depended on rules).
See:
- Roll back the manifest safely: How-to: Roll back to a previous artifact version
- Roll back artifacts safely: How-to: Roll back artifacts safely
Common failure modes (and what to check first)¶
- Serving says
SIGNAL_UNAVAILABLE: manifest key missing, wrong tenant/surface, artifact not published, or the store port is not wired. - Serving returns “empty recs”: the new signal might have filtered candidates too aggressively or caps blocked all remaining items.
- Runbook: Runbook: Empty recs
- Pipelines publish fails: validator rejects artifact or version recompute mismatch; manifest should remain on the previous version (safe failure).
- Non-deterministic ordering: unstable iteration order over maps or missing tie-break sorting in a store backend.
- Determinism notes: Ranking & constraints reference
Read next¶
- Store ports reference: Store ports
- Pipelines artifact extension: How-to: Add a new artifact type
- Artifacts and versioning: Artifacts and versioning