Exposure logging and attribution¶
This page explains Exposure logging and attribution and how it fits into the RecSys suite.
Who this is for¶
- Integrators wiring RecSys into a product (webshop, content feed, etc.)
- Recommendation engineers and analysts running
recsys-eval - Operators who need to debug “why are metrics wrong?” incidents
What you will get¶
- The minimum you must log to measure recommendations
- How to attribute outcomes to exposures safely (joins that work)
- How to configure
recsys-serviceto emit eval-compatible exposure logs - Common logging bugs and the symptoms they cause
The one rule: log exposures¶
If you only log clicks/purchases (outcomes) but not “what you showed” (exposures), you cannot evaluate recommendation quality. Clicks without exposures are not attributable.
Exposure logging is also the foundation for:
- offline regression (quality gates before shipping)
- online experiments (measuring lift)
- incident debugging (“did we serve the wrong config/rules/algo?”)
End-to-end flow (request → exposure → outcome)¶
sequenceDiagram
participant Client
participant Service as recsys-service
participant DB as Postgres
participant Store as Object store
participant Log as Exposure log
participant Events as Outcome events
Client->>Service: POST /v1/recommend
Service->>DB: Load config + rules
alt Artifact mode enabled
Service->>Store: Fetch manifest + artifacts
end
Service-->>Client: Response (items[], request_id)
Service->>Log: Write exposure.v1 (request_id, items, context)
Client->>Events: Emit outcome.v1 (request_id, item_id, ts) The join key is request_id. Your product must carry it from the recommend call to the outcome event.
What to log (minimum viable)¶
At minimum you need:
- Exposure (ranked list you showed)
- Outcome (click/conversion you observed)
- Assignment (optional, for A/B testing)
All joins are driven by request_id.
Field-level contract (schemas + examples):
- Reference: Exposure, outcome, and assignment schemas
- Reference: Event join logic (exposures ↔ outcomes ↔ assignments)
Getting request_id right (attribution correctness)¶
To attribute outcomes to the right exposure, you need a single request_id that flows:
client request → recsys-service response → outcome event
You have two common options:
- Client-supplied request IDs: set
X-Request-Idwhen calling/v1/recommend, then reuse that ID in outcome events. - Server-generated request IDs: read
meta.request_idfrom the response and attach it to outcome events.
Pick one, implement it consistently, and test joins early with recsys-eval validate.
recsys-service exposure logging (built-in)¶
The service can write exposure logs as JSONL to a file or a directory:
EXPOSURE_LOG_ENABLED=trueEXPOSURE_LOG_PATH=/app/tmp/exposures.jsonl(file) or/app/tmp/(directory)EXPOSURE_LOG_RETENTION_DAYS=30(directory mode rotates and prunes old files)
There are two output formats:
service_v1: service-native exposure event (good for audit/debugging)eval_v1:recsys-evalcompatibleexposure.v1records (recommended for evaluation workflows)
Recommended for evaluation:
EXPOSURE_LOG_ENABLED=true
EXPOSURE_LOG_FORMAT=eval_v1
EXPOSURE_LOG_PATH=/app/tmp/exposures.eval.jsonl
Config reference:
Privacy: stable pseudonymous IDs¶
Warning
Do not log raw PII (emails, names, phone numbers). Use stable pseudonymous identifiers and keep the hash salt secret.
The service logs hashed identifiers (HMAC-SHA256) rather than raw user IDs. Set a secret salt so hashes are stable and non-guessable:
EXPOSURE_HASH_SALT=change-me-to-a-secret
If no user/session identifier is available for eval output, the service falls back to using request_id as user_id to keep schemas valid (but this weakens user-level evaluation).
Common logging bugs (and symptoms)¶
- Only outcomes, no exposures
- Symptom: you can’t compute offline metrics; experiments are ambiguous.
- Outcome events missing
request_id - Symptom: join rate collapses; reports look “too good” or “too bad” randomly.
- Different
user_idvalues in exposures vs outcomes - Symptom: low join rate or joins that only work for some platforms (web vs app).
- Multiple request IDs per single rendered list
- Symptom: duplicates, inconsistent attribution, confusing on-call investigations.
- Logging raw PII
- Symptom: security review blocks adoption; you may breach internal policy.
Verification checklist (do this early)¶
- Validate schemas:
recsys-eval validate --schema exposure.v1 --input exposures.jsonlrecsys-eval validate --schema outcome.v1 --input outcomes.jsonl- Compute a basic join rate in your warehouse:
% of exposures with at least one matching outcome by request_id- Ensure your top slicing keys exist (at minimum:
tenant_id,surface).
From logs to evaluation (what you do next)¶
Once your exposure/outcome logs validate and join, run an offline evaluation and produce a report you can share.
Start with the default pack:
recsys-evaldefault evaluation pack (recommended)- Run eval and make ship/hold/rollback decisions: How-to: run evaluation and make ship decisions
Read next¶
- Data contracts hub: Data contracts
- Event join logic: Event join logic (exposures ↔ outcomes ↔ assignments)
- Experimentation model (A/B, interleaving, OPE): Experimentation model (A/B, interleaving, OPE)
- Candidate vs ranking: Candidate generation vs ranking
- Run eval and ship: How-to: run evaluation and make ship decisions