Config reference¶
Config is JSON. Example: configs/env/local.json.
Top-level fields:
out_dir: base output directory (local runs)raw_events_dir: input events directorycanonical_dir: canonical output directorycheckpoint_dir: checkpoint storage for incremental runsraw_source: raw ingestion source configurationartifacts_dir: staging directory (job mode and pipeline staging)object_store_dir: where published blobs are written (local fs mode)object_store: object store configuration (fs or s3/minio)registry_dir: where manifests and records are writtendb: optional Postgres connection for DB-backed signals
object_store¶
{
"type": "fs | s3 | minio",
"dir": ".out/objectstore",
"s3": {
"endpoint": "localhost:9000",
"bucket": "recsys-artifacts",
"access_key": "minioadmin",
"secret_key": "minioadmin",
"prefix": "recsys",
"use_ssl": false
}
}
db¶
{
"dsn": "postgres://user:pass@localhost:5432/db?sslmode=disable",
"auto_create_tenant": true,
"statement_timeout_s": 5
}
limits¶
max_days_backfillmax_events_per_runmax_sessions_per_runmax_items_per_sessionmax_distinct_items_per_runmax_neighbors_per_itemmax_items_per_artifactmin_cooc_supportmax_users_per_runmax_items_per_user
See explanation/validation-and-guardrails.md.
raw_source¶
{
"type": "fs | s3 | minio | postgres | kafka",
"dir": "testdata/events",
"s3": {
"endpoint": "localhost:9000",
"bucket": "recsys-raw",
"access_key": "minioadmin",
"secret_key": "minioadmin",
"prefix": "raw/events",
"use_ssl": false
},
"postgres": {
"dsn": "postgres://user:pass@localhost:5432/db?sslmode=disable",
"tenant_table": "tenants",
"exposure_table": "exposure_events"
},
"kafka": {
"brokers": ["localhost:9092"],
"topic": "recsys-exposures",
"group_id": "recsys-pipelines"
}
}
Note: the Kafka connector is scaffolded and returns a clear error until it is implemented with a streaming consumer.
Read next¶
- Start here: Start here
- Validation and guardrails: Validation and guardrails
- Run incremental: How-to: Run incremental pipelines
- Limit exceeded runbook: Runbook: Limit exceeded