Artifacts and Pipelines¶
This page explains how recsys-pipelines publishes versioned recommendation artifacts and how the service consumes the current manifest in artifact mode.
Mental model¶
Artifact mode separates offline computation from online serving:
recsys-pipelinesreads exposure events for a tenant, surface, segment, and day window.- It validates and canonicalizes the input.
- It computes enabled artifacts. The backward-compatible default is popularity and co-occurrence; richer controlled deployments can also enable implicit, content similarity, and session sequence artifacts.
- It writes immutable artifact blobs to object storage.
- It updates the current manifest for the tenant and surface.
recsys-servicereads the manifest and referenced artifacts after cache expiry or cache invalidation.
In this mode, the thing you ship and roll back is the current manifest, not a service binary.
State locations¶
| State | Owner | Notes |
|---|---|---|
| Tenant config and rules | recsys-service database tables | Versioned control-plane documents with ETags. |
| Canonical pipeline data | recsys-pipelines configured store | Used to build deterministic daily artifacts. |
| Artifact blobs | Pipeline object store | Immutable payloads keyed by artifact type and version. |
| Current manifest | Pipeline registry | Mutable pointer under current/<tenant>/<surface>/manifest.json in local registry mode. |
| Exposure/outcome logs | Operator logging pipeline | Inputs for evaluation and incident reconstruction. |
Local pipeline command¶
The repository proof-kit path runs the pipeline against a checked-in ecommerce fixture:
make proof-kit-test
Expected result: the pipeline writes a manifest under tmp/commercial-proof-kit/pipelines/registry/current/demo/home/manifest.json and publishes artifact blobs under tmp/commercial-proof-kit/pipelines/objectstore/. The ecommerce proof kit enables and verifies popularity, cooc, implicit, content_sim, and session_seq.
Artifact selection¶
Pipeline configs can choose which artifact kinds to compute:
{
"artifact_kinds": ["popularity", "cooc", "implicit", "content_sim", "session_seq"],
"catalog": {
"path": "../examples/data/ecommerce-mini/catalog.csv",
"format": "csv"
}
}
If artifact_kinds is omitted, the pipeline preserves the original lean behavior and publishes only popularity and cooc. content_sim requires a catalog file in CSV or JSONL format. The catalog must include item_id; tags can be provided through a tags/tag_list column in CSV or a tags array in JSONL.
Freshness and rollback¶
Monitor freshness per tenant and surface:
- manifest age
- artifact publish time
- empty recommendation rate
- signal warning rate
- pipeline job success/failure
Rollback in artifact mode means restoring the previous known-good manifest content at the current manifest path. Because artifact blobs are immutable, rollback should not require recomputing artifacts if the previous blobs still exist.
Backfills¶
Backfills reprocess historical windows. Treat them as controlled operations:
- choose the date window explicitly
- check configured backfill limits
- publish only after validation passes
- compare output size and key metrics to the previous run
- keep the previous manifest available until verification passes
Service configuration¶
The service consumes artifacts when artifact mode is enabled:
RECSYS_ARTIFACT_MODE_ENABLED=true
RECSYS_ARTIFACT_MANIFEST_TEMPLATE=s3://recsys-artifacts/registry/current/{tenant}/{surface}/manifest.json
RECSYS_ARTIFACT_MANIFEST_TTL=1m
RECSYS_ARTIFACT_CACHE_TTL=1m
Use production-safe object-store credentials, TLS settings, and cache TTLs before routing real traffic.