Performance and capacity guide¶
This guide describes how to run reproducible load tests against recsys-service and capture sizing data for production planning.
Who this is for¶
- Developers and SREs sizing
recsys-servicefor production - Engineers running load tests before enabling new signals or data modes
What you will get¶
- A runnable load-test harness
- The parameters that matter for repeatability
- A table format for recording sizing data over time
1) Preflight checklist¶
- Postgres is seeded with a tenant, config, and signal data.
- recsys-service is healthy (
/healthzreturns 200). - Auth headers are configured (dev headers or a bearer token).
2) Run the load test¶
Use the built-in harness:
./scripts/loadtest.sh
Key parameters (env vars):
BASE_URL(default: http://localhost:8000)ENDPOINT(default: /v1/recommend; set /v1/similar for similar-items)TENANT_ID,SURFACE,KREQUESTS,CONCURRENCYDEV_HEADERS=true(local) or setBEARER_TOKEN/API_KEY
Example:
BASE_URL=http://localhost:8000 \
ENDPOINT=/v1/recommend \
TENANT_ID=demo \
SURFACE=home \
REQUESTS=1000 \
CONCURRENCY=25 \
./scripts/loadtest.sh
Capture:
rps(requests/sec)- p50/p95/p99 latency
- error rate (non-2xx + timeouts)
Note
If you see a lot of 429 responses locally, you may be hitting the dev stack’s safety rate limit. Either lower CONCURRENCY/REQUESTS or use the benchmark setup in Baseline benchmarks (anchor numbers).
3) Record sizing data¶
Use this table as a living record. Fill with measured results from your environment (hardware, cache settings, dataset size).
| Tier | Target QPS | p95 Latency | CPU | Memory | Notes |
|---|---|---|---|---|---|
| dev | local, seeded data | ||||
| small | single tenant | ||||
| med | multi-tenant | ||||
| large | dedicated cache |
4) Tuning levers¶
- Cache TTLs:
RECSYS_CONFIG_CACHE_TTL,RECSYS_RULES_CACHE_TTL - Backpressure:
RECSYS_BACKPRESSURE_MAX_INFLIGHT,RECSYS_BACKPRESSURE_MAX_QUEUE - Algorithm mode:
RECSYS_ALGO_MODE(blend,popularity,cooc, etc.) - Artifact mode:
RECSYS_ARTIFACT_MODE_ENABLED(affects S3/manifest latency)
5) Repeat after changes¶
Re-run the load test after:
- schema changes (new signals)
- algorithm changes
- cache or artifact mode changes
- infrastructure changes
Read next¶
- Baseline benchmarks (anchor numbers): Baseline benchmarks (anchor numbers)
- Production readiness checklist: Production readiness checklist (RecSys suite)
- Backpressure and limits: recsys-service configuration