bug(proxy): inference.local buffers entire streaming response, inflating TTFB

## Summary

The sandbox proxy's `inference.local` interception path fully buffers upstream streaming responses (SSE) before sending any bytes back to the client. This converts a streaming response (fast TTFB, incremental tokens) into a buffered response (slow TTFB, instant completion), causing downstream clients with TTFB timeouts to abort before the proxy finishes buffering.

## Actual Behavior

1. Client sends `POST /v1/chat/completions` with `"stream": true` to `inference.local`
2. Proxy forwards to upstream (e.g. NVIDIA NIM endpoint), which begins streaming SSE events (TTFB ~200ms)
3. `response.bytes().await` in `proxy_to_backend()` blocks for the **full generation time** (10-60s)
4. Proxy sends nothing back to the client during this entire period
5. Client's TTFB/first-event timeout fires and aborts the request
6. Proxy eventually finishes buffering and tries to write to a closed connection

## Expected Behavior

The proxy should forward response headers and SSE chunks to the client incrementally as they arrive from the upstream, preserving the streaming semantics and sub-second TTFB.

## Root Cause

Three layers conspire to prevent streaming:

1. **`navigator-router/src/backend.rs:112-115`** — `response.bytes().await` collects the entire upstream response body into memory before returning
2. **`ProxyResponse` struct (`backend.rs:9-13`)** — stores body as a single `bytes::Bytes` blob with no streaming abstraction
3. **`format_http_response()` (`l7/inference.rs:244-246`)** — replaces upstream `Transfer-Encoding: chunked` with `Content-Length` based on buffered body size, then writes everything in a single `write_all` call

## Affected Path

Only the HTTPS/CONNECT inference interception path (`inference.local`) is affected. The plain HTTP forward proxy path uses `copy_bidirectional` and streams correctly.

Call chain: `handle_inference_interception()` → `route_inference_request()` → `proxy_with_candidates()` → `proxy_to_backend()` → `response.bytes().await`

## Fix Direction

- Add a streaming variant of `proxy_to_backend()` that returns headers + `impl Stream<Item = Bytes>` instead of a fully-buffered `ProxyResponse`
- Have `route_inference_request()` write response headers immediately, then forward body chunks incrementally to the TLS client
- For streaming responses, emit `Transfer-Encoding: chunked` instead of `Content-Length`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(proxy): inference.local buffers entire streaming response, inflating TTFB #260

Summary

Actual Behavior

Expected Behavior

Root Cause

Affected Path

Fix Direction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug(proxy): inference.local buffers entire streaming response, inflating TTFB #260

Description

Summary

Actual Behavior

Expected Behavior

Root Cause

Affected Path

Fix Direction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions