Security, privacy, and compliance (overview)¶

This page explains Security, privacy, and compliance (overview) and how it fits into the RecSys suite.

Who this is for¶

Engineering leads, product owners, SRE/on-call, and security/compliance reviewers evaluating or adopting the RecSys suite.

What you will get¶

A practical “shared responsibility” view for running the suite
The minimum privacy posture needed for a pilot (and what changes for production)
A checklist for security review starting points

Scope: what the suite is (and is not)¶

The RecSys suite is typically self-hosted: you run the infrastructure and own the data.
The suite does not require raw PII: use pseudonymous stable identifiers.
The suite does not attempt to be a full privacy/compliance program. You still need org-level policies for:
data classification and retention
access control and auditing
data subject requests (GDPR/CCPA) and deletion workflows

Data you will handle¶

At a minimum, adopting the suite introduces these data flows:

Serving requests: tenant_id, surface, and a pseudonymous user_id/session_id context
Exposure logs: the ranked list served, with ranks and a request_id for attribution
Outcome logs: clicks/conversions with the same request_id
Artifacts (optional): aggregated signals (popularity, co-visitation, etc.) stored in object storage

Treat exposure and outcome logs as sensitive. Even if identifiers are pseudonymous, they are often still considered personal data under many policies.

Identity and PII guidance (baseline)¶

Do not send or log raw PII (email, phone, address).
Prefer pseudonymous, stable identifiers (for example: an internal UUID or a one-way hash you control).
If you enable eval-compatible exposure logs in recsys-service, set a secret salt:
EXPOSURE_HASH_SALT=<secret>

Changing the salt breaks joins over time; rotate intentionally (and treat it like a credential).

Access control and hardening¶

Auth modes¶

recsys-service supports JWT and API keys. For local development it can also accept dev headers.

Production guidance:

Disable dev headers: DEV_AUTH_ENABLED=false
Require production auth (JWT_AUTH_ENABLED=true and/or API_KEY_ENABLED=true)
Ensure tenant scope comes from trusted auth claims (AUTH_TENANT_CLAIMS) or a trusted gateway

Admin endpoints¶

Admin endpoints can change configuration/rules and invalidate caches. Treat them as control-plane:

restrict network access (private ingress / allow-list / VPN)
require admin roles (AUTH_ADMIN_ROLE) and strong identity
enable audit logging for admin actions (AUDIT_LOG_ENABLED=true)

Rate limiting and abuse¶

Enable per-tenant rate limiting in production and monitor throttling:

TENANT_RATE_LIMIT_ENABLED=true

Logging and retention¶

Configure exposure logging intentionally:
EXPOSURE_LOG_ENABLED=true
set retention (EXPOSURE_LOG_RETENTION_DAYS) and storage controls (permissions, encryption, backups)
Treat evaluation outputs as sensitive artifacts:
reports may reveal behavior patterns or business logic
store them with appropriate access control and retention

Compliance notes (high level)¶

GDPR/CCPA: pseudonymous identifiers can still be personal data. Plan for deletion and retention limits.
Data residency: choose DB/object store regions consistent with your policy.
Auditability: enable audit logs and keep request_id propagation end-to-end for investigations.

Quick checklist (start here)¶

Use pseudonymous IDs; do not log raw PII.
Set EXPOSURE_HASH_SALT when logging exposures for evaluation.
Disable dev auth headers in production (DEV_AUTH_ENABLED=false).
Restrict admin endpoints (network + roles) and enable audit logging.
Define retention for exposure/outcome logs and evaluation reports.