feat: add sandbox.inference.local endpoint for system-level inference

## Problem Statement

Sandboxes need access to inference for platform system functions — initially an embedded agent harness for policy analysis and modifications. Today, the only inference endpoint is `inference.local`, which serves userland agents/code. Intermixing system inference into this endpoint blurs the boundary between user and platform concerns: different credentials, different models, different billing, different availability guarantees.

We need a separate system-level inference capability for sandbox platform functions, configured independently via `nemoclaw inference set --sandbox --provider foo --model bar`. Critically, **userland code must not be able to access system inference** — only the supervisor binary (or supervisor-spawned system processes) should have access.

Supersedes #203 (closed — multi-model routing was the wrong abstraction).

## Technical Context

### Why NOT a proxy-intercepted hostname

The initial design proposed `sandbox.inference.local` as a second proxy-intercepted hostname (mirroring `inference.local`). Investigation revealed this is the wrong approach:

1. **The supervisor already has the inference router in-process.** The `navigator-sandbox` binary holds an `InferenceContext` with `Arc<RwLock<Vec<ResolvedRoute>>>` and a `navigator_router::Router`. It can make inference API calls directly — no proxy hop needed.

2. **Network namespace separation already isolates the supervisor.** The supervisor runs in the **host network namespace**; user processes are in an isolated sandbox netns connected only via a veth pair to the proxy at `10.200.0.1:3128`. Any hostname exposed through the proxy is reachable by userland.

3. **`inference.local` currently has zero access control.** The proxy short-circuits at `proxy.rs:216` before OPA evaluation, binary identity checks, or any authentication. Adding a second hostname through the same proxy inherits these gaps.

4. **In-process calls are simpler and more secure.** No hostname to intercept, no TLS cert to generate, no proxy routing changes, no network surface to attack.

### Chosen approach: in-process API with Unix socket IPC fallback

- **Primary:** Supervisor calls the router directly via an in-process API when the embedded agent harness is part of the supervisor binary.
- **Fallback (IPC):** If the harness runs as a separate process, the supervisor exposes an HTTP endpoint on a Unix domain socket. The socket path is placed in a Landlock-restricted directory inaccessible to userland. The harness process gets access via fd inheritance or a permitted path. It speaks the same OpenAI-compatible HTTP/JSON API — no special SDK needed.

## Affected Components

| Component | Key Files | Role |
|-----------|-----------|------|
| Proto | `proto/inference.proto` | Wire format — needs route name field on set/get requests |
| Server | `crates/navigator-server/src/inference.rs` | Storage and resolution — needs to handle two named routes |
| Sandbox lifecycle | `crates/navigator-sandbox/src/lib.rs` | Bundle conversion, route refresh, and in-process inference API |
| CLI | `crates/navigator-cli/src/main.rs`, `run.rs` | Needs `--sandbox` flag on set/get/update commands |
| Architecture docs | `architecture/inference-routing.md`, `sandbox.md`, `gateway.md` | Need updating |

**No changes required** in: router (`navigator-router`), proxy (`proxy.rs`), inference pattern detection (`l7/inference.rs`), TLS (`l7/tls.rs`), DNS/network namespace, gRPC client.

## Technical Investigation

### Architecture Overview

**Supervisor process model** (`crates/navigator-sandbox/src/lib.rs`):
- The supervisor IS the `navigator-sandbox` binary (PID 1 in the container)
- Runs in the **host network namespace** with root privileges
- Spawns user processes (entrypoint + SSH sessions) into an isolated sandbox netns via `setns()` (`process.rs:171`)
- User processes are privilege-dropped (`process.rs:180`) and Landlock/seccomp restricted (`process.rs:183`)
- The proxy binds on the veth host IP `10.200.0.1:3128` — the sole network gateway from sandbox netns

**Current inference pipeline** (user-facing, `inference.local`):
1. User process sends `CONNECT inference.local:443` to proxy via `HTTPS_PROXY` env var
2. Proxy matches hostname string at `proxy.rs:216` (short-circuits before OPA/identity checks)
3. TLS terminates with ephemeral cert, HTTP parsed, pattern detected
4. `route_inference_request()` reads routes from `InferenceContext.routes`, passes to router
5. Router picks first protocol-matching route, overwrites `model` field in body (`backend.rs:82-93`)

**Proposed system inference pipeline** (supervisor-only):
1. Supervisor receives system inference request (from embedded harness or via Unix socket IPC)
2. Reads system routes from a separate route cache (populated from the same bundle delivery)
3. Calls `navigator_router::Router::proxy_to_backend()` directly — same backend proxy logic, no CONNECT/TLS overhead
4. Returns response to the harness

### Code References

| Location | Description |
|----------|-------------|
| `crates/navigator-sandbox/src/lib.rs:143` | `run_sandbox()` — supervisor orchestrator, owns all contexts |
| `crates/navigator-sandbox/src/lib.rs:569-674` | `build_inference_context()` — builds router + routes from bundle |
| `crates/navigator-sandbox/src/lib.rs:677-696` | `bundle_to_resolved_routes()` — proto→router conversion |
| `crates/navigator-sandbox/src/lib.rs:704-743` | `spawn_route_refresh()` — polls gateway every 5s, replaces route cache |
| `crates/navigator-sandbox/src/proxy.rs:21` | `INFERENCE_LOCAL_HOST = "inference.local"` — hardcoded constant |
| `crates/navigator-sandbox/src/proxy.rs:48-75` | `InferenceContext` — holds patterns, router, `routes: Arc<RwLock<Vec<ResolvedRoute>>>` |
| `crates/navigator-sandbox/src/proxy.rs:216-229` | `inference.local` short-circuit — no auth, no OPA, no identity check |
| `crates/navigator-sandbox/src/process.rs:163-183` | User process `pre_exec`: `setns()` → `drop_privileges()` → Landlock/seccomp |
| `crates/navigator-sandbox/src/sandbox/linux/netns.rs:53` | Network namespace creation — veth pair, `10.200.0.1` (host) / `10.200.0.2` (sandbox) |
| `crates/navigator-sandbox/src/sandbox/linux/landlock.rs` | Landlock filesystem restrictions on user processes |
| `crates/navigator-server/src/inference.rs:28` | `CLUSTER_INFERENCE_ROUTE_NAME = "inference.local"` — hardcoded |
| `crates/navigator-server/src/inference.rs:118-171` | `upsert_cluster_inference_route()` — hardcodes route name |
| `crates/navigator-server/src/inference.rs:260-291` | `resolve_inference_bundle()` — resolves single route |
| `crates/navigator-server/src/inference.rs:293-340` | `resolve_managed_cluster_route()` — hardcoded to single name |
| `proto/inference.proto:50-55` | `SetClusterInferenceRequest` — no route name field |
| `proto/inference.proto:83-89` | `GetInferenceBundleResponse` — already `repeated ResolvedRoute routes` |
| `proto/inference.proto:74-81` | Proto `ResolvedRoute` — has `name` field (line 75) |
| `crates/navigator-router/src/lib.rs:59-97` | `proxy_with_candidates()` — route selection by protocol |
| `crates/navigator-router/src/backend.rs:21-122` | `proxy_to_backend()` — backend proxying, model overwrite |
| `crates/navigator-cli/src/main.rs:811-837` | `ClusterInferenceCommands` — no sandbox flag |
| `crates/navigator-cli/src/run.rs:2787-2809` | `cluster_inference_set()` — single-route gRPC request |
| `crates/navigator-cli/src/run.rs:2852-2866` | `cluster_inference_get()` — displays single config |

### Current Behavior

- One inference endpoint: `inference.local`, accessible to all sandbox processes without authentication
- One `InferenceRoute` record stored (name: `"inference.local"`)
- Bundle response returns 0 or 1 routes
- Supervisor holds the `InferenceContext` with a single route cache
- No system-level inference capability exists

### What Would Need to Change

**Proto layer** (`proto/inference.proto`):
- Add `string route_name = 3` to `SetClusterInferenceRequest` (empty = `"inference.local"` for backward compat)
- Add `string route_name = 1` to `GetClusterInferenceRequest`
- Consider adding `string route_name` to response messages for disambiguation

**Server layer** (`crates/navigator-server/src/inference.rs`):
- Add `SANDBOX_INFERENCE_ROUTE_NAME` constant (e.g., `"sandbox-system"`)
- Parameterize `upsert_cluster_inference_route()` to accept route name
- Parameterize `set_cluster_inference()` and `get_cluster_inference()` handlers
- Update `resolve_inference_bundle()` to resolve both named routes
- The store already supports multiple `inference_route` records with different names — no store changes

**Sandbox lifecycle** (`crates/navigator-sandbox/src/lib.rs`):
- Add a second route cache for system routes to `InferenceContext` (or a new `SystemInferenceContext`)
- During `bundle_to_resolved_routes()`, segregate routes by proto `ResolvedRoute.name`
- `spawn_route_refresh()` populates both caches
- Expose an in-process API for the supervisor to make inference calls using the system routes

**IPC fallback** (new code in `crates/navigator-sandbox/`):
- Unix domain socket HTTP server bound to a Landlock-restricted path
- Accepts OpenAI-compatible requests, routes through system routes via the router
- Socket path passed to harness process via env var or fd inheritance

**CLI** (`crates/navigator-cli/src/main.rs`, `run.rs`):
- Add `--sandbox` flag to `Set`, `Get`, `Update` commands
- When `--sandbox`, set `route_name = "sandbox-system"` in the request
- `inference get` (without flag) shows both endpoints by default

### Patterns to Follow

- The `CLUSTER_INFERENCE_ROUTE_NAME` constant pattern — add a parallel constant
- The existing route storage uses `get_message_by_name::<InferenceRoute>(name)` — second named route just works
- `GetInferenceBundleResponse.routes` is already `repeated` — designed for multiple routes
- The SSH handshake secret pattern (`ssh.rs:206`) — HMAC-based auth with nonce — could inform IPC auth if needed

## Proposed Approach

Store a second `InferenceRoute` record (name: `"sandbox-system"`) via `nemoclaw inference set --sandbox`. Include both routes in the inference bundle. On the sandbox side, segregate routes by name during bundle conversion and hold a separate system route cache. Expose system inference as an in-process API on the supervisor, with a Unix socket IPC fallback for non-embedded harness processes. The proxy, router, TLS, and DNS layers require no changes.

## Scope Assessment

- **Complexity:** Low-Medium — server and CLI changes are straightforward parameterization; the in-process API is a thin wrapper around existing router calls; IPC is the most novel piece but well-understood
- **Confidence:** High — leverages existing infrastructure (bundle delivery, route caching, router) with minimal new surface
- **Estimated files to change:** ~8-10 (proto, server inference, sandbox lib, CLI main, CLI run, docs; plus new IPC module if building the fallback)
- **Issue type:** `feat`

## Risks & Open Questions

- **Agent harness packaging**: How the harness gets into the sandbox container is a separate concern. This feature provides the inference capability; the harness deployment is orthogonal.
- **Observability**: System inference calls should be logged separately from user inference. At minimum, a separate log file or distinct log tags/labels. The in-process API should emit structured logs with a `source=system` discriminator.
- **Unconfigured endpoint behavior**: Should match current `inference.local` behavior — when the sandbox route isn't configured, the system route cache is empty and inference calls return an appropriate error (503 or equivalent). Currently when `inference.local` is unconfigured, the bundle returns empty routes and the proxy returns an error to the caller.
- **`inference get` display**: Show both user and system configurations by default. `--sandbox` flag filters to system only.
- **IPC socket security**: The Unix socket path must be in a directory that Landlock prevents user processes from accessing. Verify that the Landlock policy restricts the chosen path. The supervisor creates the socket before spawning user processes, so ordering is natural.
- **Future extensibility**: Keep it simple — two route types (user + system) is concrete. N route types is speculative and not needed now.

## Test Considerations

- **Server unit tests** (`crates/navigator-server/src/inference.rs`): Test storing/retrieving two routes independently, resolving both in a single bundle, version independence between routes. Follow existing patterns (lines 342-581).
- **Bundle conversion test** (`crates/navigator-sandbox/src/lib.rs`): Test that route segregation by name correctly populates user vs. system caches.
- **In-process API test**: Test that the supervisor can make inference calls using system routes, with correct model overwrite and credential resolution.
- **IPC test** (if built): Test Unix socket endpoint accepts requests, routes through system routes, and is inaccessible from a Landlock-restricted process.
- **E2E test** (`e2e/python/test_inference_routing.py`): Configure both user and system routes, verify user endpoint still works, verify system endpoint uses correct credentials/model. Verify userland cannot reach system inference.
- **Backward compat**: Existing single-route clusters work unchanged (no system route configured, user route unaffected).

---
*Created by spike investigation. Supersedes #203. Use `build-from-issue` to plan and implement.*

Location	Description
`crates/navigator-sandbox/src/lib.rs:143`	`run_sandbox()` — supervisor orchestrator, owns all contexts
`crates/navigator-sandbox/src/lib.rs:569-674`	`build_inference_context()` — builds router + routes from bundle
`crates/navigator-sandbox/src/lib.rs:677-696`	`bundle_to_resolved_routes()` — proto→router conversion
`crates/navigator-sandbox/src/lib.rs:704-743`	`spawn_route_refresh()` — polls gateway every 5s, replaces route cache
`crates/navigator-sandbox/src/proxy.rs:21`	`INFERENCE_LOCAL_HOST = "inference.local"` — hardcoded constant
`crates/navigator-sandbox/src/proxy.rs:48-75`	`InferenceContext` — holds patterns, router, `routes: Arc<RwLock<Vec<ResolvedRoute>>>`
`crates/navigator-sandbox/src/proxy.rs:216-229`	`inference.local` short-circuit — no auth, no OPA, no identity check
`crates/navigator-sandbox/src/process.rs:163-183`	User process `pre_exec`: `setns()` → `drop_privileges()` → Landlock/seccomp
`crates/navigator-sandbox/src/sandbox/linux/netns.rs:53`	Network namespace creation — veth pair, `10.200.0.1` (host) / `10.200.0.2` (sandbox)
`crates/navigator-sandbox/src/sandbox/linux/landlock.rs`	Landlock filesystem restrictions on user processes
`crates/navigator-server/src/inference.rs:28`	`CLUSTER_INFERENCE_ROUTE_NAME = "inference.local"` — hardcoded
`crates/navigator-server/src/inference.rs:118-171`	`upsert_cluster_inference_route()` — hardcodes route name
`crates/navigator-server/src/inference.rs:260-291`	`resolve_inference_bundle()` — resolves single route
`crates/navigator-server/src/inference.rs:293-340`	`resolve_managed_cluster_route()` — hardcoded to single name
`proto/inference.proto:50-55`	`SetClusterInferenceRequest` — no route name field
`proto/inference.proto:83-89`	`GetInferenceBundleResponse` — already `repeated ResolvedRoute routes`
`proto/inference.proto:74-81`	Proto `ResolvedRoute` — has `name` field (line 75)
`crates/navigator-router/src/lib.rs:59-97`	`proxy_with_candidates()` — route selection by protocol
`crates/navigator-router/src/backend.rs:21-122`	`proxy_to_backend()` — backend proxying, model overwrite
`crates/navigator-cli/src/main.rs:811-837`	`ClusterInferenceCommands` — no sandbox flag
`crates/navigator-cli/src/run.rs:2787-2809`	`cluster_inference_set()` — single-route gRPC request
`crates/navigator-cli/src/run.rs:2852-2866`	`cluster_inference_get()` — displays single config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add sandbox.inference.local endpoint for system-level inference #207

Problem Statement

Technical Context

Why NOT a proxy-intercepted hostname

Chosen approach: in-process API with Unix socket IPC fallback

Affected Components

Technical Investigation

Architecture Overview

Code References

Current Behavior

What Would Need to Change

Patterns to Follow

Proposed Approach

Scope Assessment

Risks & Open Questions

Test Considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	Key Files	Role
Proto	`proto/inference.proto`	Wire format — needs route name field on set/get requests
Server	`crates/navigator-server/src/inference.rs`	Storage and resolution — needs to handle two named routes
Sandbox lifecycle	`crates/navigator-sandbox/src/lib.rs`	Bundle conversion, route refresh, and in-process inference API
CLI	`crates/navigator-cli/src/main.rs`, `run.rs`	Needs `--sandbox` flag on set/get/update commands
Architecture docs	`architecture/inference-routing.md`, `sandbox.md`, `gateway.md`	Need updating

feat: add sandbox.inference.local endpoint for system-level inference #207

Description

Problem Statement

Technical Context

Why NOT a proxy-intercepted hostname

Chosen approach: in-process API with Unix socket IPC fallback

Affected Components

Technical Investigation

Architecture Overview

Code References

Current Behavior

What Would Need to Change

Patterns to Follow

Proposed Approach

Scope Assessment

Risks & Open Questions

Test Considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions