Benchmarks¶

Use scripts/loadtest.sh to create reproducible serving evidence for a specific deployment and dataset.

BASE_URL=http://localhost:8000 \
TENANT_ID=demo \
SURFACE=home \
REQUESTS=1000 \
CONCURRENCY=25 \
USER_CARDINALITY=1000 \
CATALOG_SIZE=8 \
ARTIFACT_SIZE_BYTES="$(find tmp/commercial-proof-kit/pipelines/objectstore -type f -print0 2>/dev/null | xargs -0 stat -c '%s' 2>/dev/null | awk '{s+=$1} END {print s+0}')" \
REPORT_JSON=tmp/loadtest-report.json \
REPORT_MARKDOWN=tmp/loadtest-report.md \
bash scripts/loadtest.sh

Expected result: the command prints request totals, RPS, p50/p95/p99 latency, status codes, and writes JSON/Markdown reports when report paths are set.

See Ecommerce Mini Local Load-Test Reference for a checked-in smoke result from the local demo stack.

Report Template¶

Keep benchmark reports with enough context to be useful:

RecSys version, image tag, and commit.
Dataset/catalog size, artifact size, user cardinality, request count, and concurrency.
Deployment shape: replicas, CPU/memory requests and limits, database/object-store location, and cache TTLs.
p50, p95, p99 latency, RPS, success/error counts, and status-code distribution.
CPU, memory, and degradation notes.

Do not generalize from one fixture. A local ecommerce-mini report is useful as a smoke reference; production sizing requires a report from the operator's own catalog, traffic shape, and infrastructure.