VaultPlane Gateway

Performance

The Gateway sits on the request path, so its overhead has to be small enough to ignore. We will publish numbers only with full context: hardware, payload sizes, concurrency, and provider.

Benchmark in progress

Numbers land here when the runtime stabilizes. Placeholders below are intentional.

TBD

P50 latency overhead

TBD

P99 latency overhead

TBD

Requests / sec / node

What we measure

Latency overhead

Added latency versus calling the provider directly, measured at P50 and P99 under both streaming and non-streaming requests.

Throughput

Sustained requests per second per node before latency degrades.

Streaming pass-through

Time to first token compared with a direct provider call, to confirm the Gateway does not buffer the stream.

Methodology

Each published figure will ship with a reproducible test: the node size, the request mix, the concurrency level, the model and provider used, and whether caching was enabled. Numbers without that context are not useful, so we will not post them until the harness and the runtime are stable.

How the data plane keeps overhead low