Problem Statement
Sandboxes need access to inference for platform system functions — initially an embedded agent harness for policy analysis and modifications. Today, the only inference endpoint is inference.local, which serves userland agents/code. Intermixing system inference into this endpoint blurs the boundary between user and platform concerns: different credentials, different models, different billing, different availability guarantees.
We need a separate system-level inference capability for sandbox platform functions, configured independently via nemoclaw inference set --sandbox --provider foo --model bar. Critically, userland code must not be able to access system inference — only the supervisor binary (or supervisor-spawned system processes) should have access.
Supersedes #203 (closed — multi-model routing was the wrong abstraction).
Technical Context
Why NOT a proxy-intercepted hostname
The initial design proposed sandbox.inference.local as a second proxy-intercepted hostname (mirroring inference.local). Investigation revealed this is the wrong approach:
-
The supervisor already has the inference router in-process. The navigator-sandbox binary holds an InferenceContext with Arc<RwLock<Vec<ResolvedRoute>>> and a navigator_router::Router. It can make inference API calls directly — no proxy hop needed.
-
Network namespace separation already isolates the supervisor. The supervisor runs in the host network namespace; user processes are in an isolated sandbox netns connected only via a veth pair to the proxy at 10.200.0.1:3128. Any hostname exposed through the proxy is reachable by userland.
-
inference.local currently has zero access control. The proxy short-circuits at proxy.rs:216 before OPA evaluation, binary identity checks, or any authentication. Adding a second hostname through the same proxy inherits these gaps.
-
In-process calls are simpler and more secure. No hostname to intercept, no TLS cert to generate, no proxy routing changes, no network surface to attack.
Chosen approach: in-process API with Unix socket IPC fallback
- Primary: Supervisor calls the router directly via an in-process API when the embedded agent harness is part of the supervisor binary.
- Fallback (IPC): If the harness runs as a separate process, the supervisor exposes an HTTP endpoint on a Unix domain socket. The socket path is placed in a Landlock-restricted directory inaccessible to userland. The harness process gets access via fd inheritance or a permitted path. It speaks the same OpenAI-compatible HTTP/JSON API — no special SDK needed.
Affected Components
| Component |
Key Files |
Role |
| Proto |
proto/inference.proto |
Wire format — needs route name field on set/get requests |
| Server |
crates/navigator-server/src/inference.rs |
Storage and resolution — needs to handle two named routes |
| Sandbox lifecycle |
crates/navigator-sandbox/src/lib.rs |
Bundle conversion, route refresh, and in-process inference API |
| CLI |
crates/navigator-cli/src/main.rs, run.rs |
Needs --sandbox flag on set/get/update commands |
| Architecture docs |
architecture/inference-routing.md, sandbox.md, gateway.md |
Need updating |
No changes required in: router (navigator-router), proxy (proxy.rs), inference pattern detection (l7/inference.rs), TLS (l7/tls.rs), DNS/network namespace, gRPC client.
Technical Investigation
Architecture Overview
Supervisor process model (crates/navigator-sandbox/src/lib.rs):
- The supervisor IS the
navigator-sandbox binary (PID 1 in the container)
- Runs in the host network namespace with root privileges
- Spawns user processes (entrypoint + SSH sessions) into an isolated sandbox netns via
setns() (process.rs:171)
- User processes are privilege-dropped (
process.rs:180) and Landlock/seccomp restricted (process.rs:183)
- The proxy binds on the veth host IP
10.200.0.1:3128 — the sole network gateway from sandbox netns
Current inference pipeline (user-facing, inference.local):
- User process sends
CONNECT inference.local:443 to proxy via HTTPS_PROXY env var
- Proxy matches hostname string at
proxy.rs:216 (short-circuits before OPA/identity checks)
- TLS terminates with ephemeral cert, HTTP parsed, pattern detected
route_inference_request() reads routes from InferenceContext.routes, passes to router
- Router picks first protocol-matching route, overwrites
model field in body (backend.rs:82-93)
Proposed system inference pipeline (supervisor-only):
- Supervisor receives system inference request (from embedded harness or via Unix socket IPC)
- Reads system routes from a separate route cache (populated from the same bundle delivery)
- Calls
navigator_router::Router::proxy_to_backend() directly — same backend proxy logic, no CONNECT/TLS overhead
- Returns response to the harness
Code References
| Location |
Description |
crates/navigator-sandbox/src/lib.rs:143 |
run_sandbox() — supervisor orchestrator, owns all contexts |
crates/navigator-sandbox/src/lib.rs:569-674 |
build_inference_context() — builds router + routes from bundle |
crates/navigator-sandbox/src/lib.rs:677-696 |
bundle_to_resolved_routes() — proto→router conversion |
crates/navigator-sandbox/src/lib.rs:704-743 |
spawn_route_refresh() — polls gateway every 5s, replaces route cache |
crates/navigator-sandbox/src/proxy.rs:21 |
INFERENCE_LOCAL_HOST = "inference.local" — hardcoded constant |
crates/navigator-sandbox/src/proxy.rs:48-75 |
InferenceContext — holds patterns, router, routes: Arc<RwLock<Vec<ResolvedRoute>>> |
crates/navigator-sandbox/src/proxy.rs:216-229 |
inference.local short-circuit — no auth, no OPA, no identity check |
crates/navigator-sandbox/src/process.rs:163-183 |
User process pre_exec: setns() → drop_privileges() → Landlock/seccomp |
crates/navigator-sandbox/src/sandbox/linux/netns.rs:53 |
Network namespace creation — veth pair, 10.200.0.1 (host) / 10.200.0.2 (sandbox) |
crates/navigator-sandbox/src/sandbox/linux/landlock.rs |
Landlock filesystem restrictions on user processes |
crates/navigator-server/src/inference.rs:28 |
CLUSTER_INFERENCE_ROUTE_NAME = "inference.local" — hardcoded |
crates/navigator-server/src/inference.rs:118-171 |
upsert_cluster_inference_route() — hardcodes route name |
crates/navigator-server/src/inference.rs:260-291 |
resolve_inference_bundle() — resolves single route |
crates/navigator-server/src/inference.rs:293-340 |
resolve_managed_cluster_route() — hardcoded to single name |
proto/inference.proto:50-55 |
SetClusterInferenceRequest — no route name field |
proto/inference.proto:83-89 |
GetInferenceBundleResponse — already repeated ResolvedRoute routes |
proto/inference.proto:74-81 |
Proto ResolvedRoute — has name field (line 75) |
crates/navigator-router/src/lib.rs:59-97 |
proxy_with_candidates() — route selection by protocol |
crates/navigator-router/src/backend.rs:21-122 |
proxy_to_backend() — backend proxying, model overwrite |
crates/navigator-cli/src/main.rs:811-837 |
ClusterInferenceCommands — no sandbox flag |
crates/navigator-cli/src/run.rs:2787-2809 |
cluster_inference_set() — single-route gRPC request |
crates/navigator-cli/src/run.rs:2852-2866 |
cluster_inference_get() — displays single config |
Current Behavior
- One inference endpoint:
inference.local, accessible to all sandbox processes without authentication
- One
InferenceRoute record stored (name: "inference.local")
- Bundle response returns 0 or 1 routes
- Supervisor holds the
InferenceContext with a single route cache
- No system-level inference capability exists
What Would Need to Change
Proto layer (proto/inference.proto):
- Add
string route_name = 3 to SetClusterInferenceRequest (empty = "inference.local" for backward compat)
- Add
string route_name = 1 to GetClusterInferenceRequest
- Consider adding
string route_name to response messages for disambiguation
Server layer (crates/navigator-server/src/inference.rs):
- Add
SANDBOX_INFERENCE_ROUTE_NAME constant (e.g., "sandbox-system")
- Parameterize
upsert_cluster_inference_route() to accept route name
- Parameterize
set_cluster_inference() and get_cluster_inference() handlers
- Update
resolve_inference_bundle() to resolve both named routes
- The store already supports multiple
inference_route records with different names — no store changes
Sandbox lifecycle (crates/navigator-sandbox/src/lib.rs):
- Add a second route cache for system routes to
InferenceContext (or a new SystemInferenceContext)
- During
bundle_to_resolved_routes(), segregate routes by proto ResolvedRoute.name
spawn_route_refresh() populates both caches
- Expose an in-process API for the supervisor to make inference calls using the system routes
IPC fallback (new code in crates/navigator-sandbox/):
- Unix domain socket HTTP server bound to a Landlock-restricted path
- Accepts OpenAI-compatible requests, routes through system routes via the router
- Socket path passed to harness process via env var or fd inheritance
CLI (crates/navigator-cli/src/main.rs, run.rs):
- Add
--sandbox flag to Set, Get, Update commands
- When
--sandbox, set route_name = "sandbox-system" in the request
inference get (without flag) shows both endpoints by default
Patterns to Follow
- The
CLUSTER_INFERENCE_ROUTE_NAME constant pattern — add a parallel constant
- The existing route storage uses
get_message_by_name::<InferenceRoute>(name) — second named route just works
GetInferenceBundleResponse.routes is already repeated — designed for multiple routes
- The SSH handshake secret pattern (
ssh.rs:206) — HMAC-based auth with nonce — could inform IPC auth if needed
Proposed Approach
Store a second InferenceRoute record (name: "sandbox-system") via nemoclaw inference set --sandbox. Include both routes in the inference bundle. On the sandbox side, segregate routes by name during bundle conversion and hold a separate system route cache. Expose system inference as an in-process API on the supervisor, with a Unix socket IPC fallback for non-embedded harness processes. The proxy, router, TLS, and DNS layers require no changes.
Scope Assessment
- Complexity: Low-Medium — server and CLI changes are straightforward parameterization; the in-process API is a thin wrapper around existing router calls; IPC is the most novel piece but well-understood
- Confidence: High — leverages existing infrastructure (bundle delivery, route caching, router) with minimal new surface
- Estimated files to change: ~8-10 (proto, server inference, sandbox lib, CLI main, CLI run, docs; plus new IPC module if building the fallback)
- Issue type:
feat
Risks & Open Questions
- Agent harness packaging: How the harness gets into the sandbox container is a separate concern. This feature provides the inference capability; the harness deployment is orthogonal.
- Observability: System inference calls should be logged separately from user inference. At minimum, a separate log file or distinct log tags/labels. The in-process API should emit structured logs with a
source=system discriminator.
- Unconfigured endpoint behavior: Should match current
inference.local behavior — when the sandbox route isn't configured, the system route cache is empty and inference calls return an appropriate error (503 or equivalent). Currently when inference.local is unconfigured, the bundle returns empty routes and the proxy returns an error to the caller.
inference get display: Show both user and system configurations by default. --sandbox flag filters to system only.
- IPC socket security: The Unix socket path must be in a directory that Landlock prevents user processes from accessing. Verify that the Landlock policy restricts the chosen path. The supervisor creates the socket before spawning user processes, so ordering is natural.
- Future extensibility: Keep it simple — two route types (user + system) is concrete. N route types is speculative and not needed now.
Test Considerations
- Server unit tests (
crates/navigator-server/src/inference.rs): Test storing/retrieving two routes independently, resolving both in a single bundle, version independence between routes. Follow existing patterns (lines 342-581).
- Bundle conversion test (
crates/navigator-sandbox/src/lib.rs): Test that route segregation by name correctly populates user vs. system caches.
- In-process API test: Test that the supervisor can make inference calls using system routes, with correct model overwrite and credential resolution.
- IPC test (if built): Test Unix socket endpoint accepts requests, routes through system routes, and is inaccessible from a Landlock-restricted process.
- E2E test (
e2e/python/test_inference_routing.py): Configure both user and system routes, verify user endpoint still works, verify system endpoint uses correct credentials/model. Verify userland cannot reach system inference.
- Backward compat: Existing single-route clusters work unchanged (no system route configured, user route unaffected).
Created by spike investigation. Supersedes #203. Use build-from-issue to plan and implement.
Problem Statement
Sandboxes need access to inference for platform system functions — initially an embedded agent harness for policy analysis and modifications. Today, the only inference endpoint is
inference.local, which serves userland agents/code. Intermixing system inference into this endpoint blurs the boundary between user and platform concerns: different credentials, different models, different billing, different availability guarantees.We need a separate system-level inference capability for sandbox platform functions, configured independently via
nemoclaw inference set --sandbox --provider foo --model bar. Critically, userland code must not be able to access system inference — only the supervisor binary (or supervisor-spawned system processes) should have access.Supersedes #203 (closed — multi-model routing was the wrong abstraction).
Technical Context
Why NOT a proxy-intercepted hostname
The initial design proposed
sandbox.inference.localas a second proxy-intercepted hostname (mirroringinference.local). Investigation revealed this is the wrong approach:The supervisor already has the inference router in-process. The
navigator-sandboxbinary holds anInferenceContextwithArc<RwLock<Vec<ResolvedRoute>>>and anavigator_router::Router. It can make inference API calls directly — no proxy hop needed.Network namespace separation already isolates the supervisor. The supervisor runs in the host network namespace; user processes are in an isolated sandbox netns connected only via a veth pair to the proxy at
10.200.0.1:3128. Any hostname exposed through the proxy is reachable by userland.inference.localcurrently has zero access control. The proxy short-circuits atproxy.rs:216before OPA evaluation, binary identity checks, or any authentication. Adding a second hostname through the same proxy inherits these gaps.In-process calls are simpler and more secure. No hostname to intercept, no TLS cert to generate, no proxy routing changes, no network surface to attack.
Chosen approach: in-process API with Unix socket IPC fallback
Affected Components
proto/inference.protocrates/navigator-server/src/inference.rscrates/navigator-sandbox/src/lib.rscrates/navigator-cli/src/main.rs,run.rs--sandboxflag on set/get/update commandsarchitecture/inference-routing.md,sandbox.md,gateway.mdNo changes required in: router (
navigator-router), proxy (proxy.rs), inference pattern detection (l7/inference.rs), TLS (l7/tls.rs), DNS/network namespace, gRPC client.Technical Investigation
Architecture Overview
Supervisor process model (
crates/navigator-sandbox/src/lib.rs):navigator-sandboxbinary (PID 1 in the container)setns()(process.rs:171)process.rs:180) and Landlock/seccomp restricted (process.rs:183)10.200.0.1:3128— the sole network gateway from sandbox netnsCurrent inference pipeline (user-facing,
inference.local):CONNECT inference.local:443to proxy viaHTTPS_PROXYenv varproxy.rs:216(short-circuits before OPA/identity checks)route_inference_request()reads routes fromInferenceContext.routes, passes to routermodelfield in body (backend.rs:82-93)Proposed system inference pipeline (supervisor-only):
navigator_router::Router::proxy_to_backend()directly — same backend proxy logic, no CONNECT/TLS overheadCode References
crates/navigator-sandbox/src/lib.rs:143run_sandbox()— supervisor orchestrator, owns all contextscrates/navigator-sandbox/src/lib.rs:569-674build_inference_context()— builds router + routes from bundlecrates/navigator-sandbox/src/lib.rs:677-696bundle_to_resolved_routes()— proto→router conversioncrates/navigator-sandbox/src/lib.rs:704-743spawn_route_refresh()— polls gateway every 5s, replaces route cachecrates/navigator-sandbox/src/proxy.rs:21INFERENCE_LOCAL_HOST = "inference.local"— hardcoded constantcrates/navigator-sandbox/src/proxy.rs:48-75InferenceContext— holds patterns, router,routes: Arc<RwLock<Vec<ResolvedRoute>>>crates/navigator-sandbox/src/proxy.rs:216-229inference.localshort-circuit — no auth, no OPA, no identity checkcrates/navigator-sandbox/src/process.rs:163-183pre_exec:setns()→drop_privileges()→ Landlock/seccompcrates/navigator-sandbox/src/sandbox/linux/netns.rs:5310.200.0.1(host) /10.200.0.2(sandbox)crates/navigator-sandbox/src/sandbox/linux/landlock.rscrates/navigator-server/src/inference.rs:28CLUSTER_INFERENCE_ROUTE_NAME = "inference.local"— hardcodedcrates/navigator-server/src/inference.rs:118-171upsert_cluster_inference_route()— hardcodes route namecrates/navigator-server/src/inference.rs:260-291resolve_inference_bundle()— resolves single routecrates/navigator-server/src/inference.rs:293-340resolve_managed_cluster_route()— hardcoded to single nameproto/inference.proto:50-55SetClusterInferenceRequest— no route name fieldproto/inference.proto:83-89GetInferenceBundleResponse— alreadyrepeated ResolvedRoute routesproto/inference.proto:74-81ResolvedRoute— hasnamefield (line 75)crates/navigator-router/src/lib.rs:59-97proxy_with_candidates()— route selection by protocolcrates/navigator-router/src/backend.rs:21-122proxy_to_backend()— backend proxying, model overwritecrates/navigator-cli/src/main.rs:811-837ClusterInferenceCommands— no sandbox flagcrates/navigator-cli/src/run.rs:2787-2809cluster_inference_set()— single-route gRPC requestcrates/navigator-cli/src/run.rs:2852-2866cluster_inference_get()— displays single configCurrent Behavior
inference.local, accessible to all sandbox processes without authenticationInferenceRouterecord stored (name:"inference.local")InferenceContextwith a single route cacheWhat Would Need to Change
Proto layer (
proto/inference.proto):string route_name = 3toSetClusterInferenceRequest(empty ="inference.local"for backward compat)string route_name = 1toGetClusterInferenceRequeststring route_nameto response messages for disambiguationServer layer (
crates/navigator-server/src/inference.rs):SANDBOX_INFERENCE_ROUTE_NAMEconstant (e.g.,"sandbox-system")upsert_cluster_inference_route()to accept route nameset_cluster_inference()andget_cluster_inference()handlersresolve_inference_bundle()to resolve both named routesinference_routerecords with different names — no store changesSandbox lifecycle (
crates/navigator-sandbox/src/lib.rs):InferenceContext(or a newSystemInferenceContext)bundle_to_resolved_routes(), segregate routes by protoResolvedRoute.namespawn_route_refresh()populates both cachesIPC fallback (new code in
crates/navigator-sandbox/):CLI (
crates/navigator-cli/src/main.rs,run.rs):--sandboxflag toSet,Get,Updatecommands--sandbox, setroute_name = "sandbox-system"in the requestinference get(without flag) shows both endpoints by defaultPatterns to Follow
CLUSTER_INFERENCE_ROUTE_NAMEconstant pattern — add a parallel constantget_message_by_name::<InferenceRoute>(name)— second named route just worksGetInferenceBundleResponse.routesis alreadyrepeated— designed for multiple routesssh.rs:206) — HMAC-based auth with nonce — could inform IPC auth if neededProposed Approach
Store a second
InferenceRouterecord (name:"sandbox-system") vianemoclaw inference set --sandbox. Include both routes in the inference bundle. On the sandbox side, segregate routes by name during bundle conversion and hold a separate system route cache. Expose system inference as an in-process API on the supervisor, with a Unix socket IPC fallback for non-embedded harness processes. The proxy, router, TLS, and DNS layers require no changes.Scope Assessment
featRisks & Open Questions
source=systemdiscriminator.inference.localbehavior — when the sandbox route isn't configured, the system route cache is empty and inference calls return an appropriate error (503 or equivalent). Currently wheninference.localis unconfigured, the bundle returns empty routes and the proxy returns an error to the caller.inference getdisplay: Show both user and system configurations by default.--sandboxflag filters to system only.Test Considerations
crates/navigator-server/src/inference.rs): Test storing/retrieving two routes independently, resolving both in a single bundle, version independence between routes. Follow existing patterns (lines 342-581).crates/navigator-sandbox/src/lib.rs): Test that route segregation by name correctly populates user vs. system caches.e2e/python/test_inference_routing.py): Configure both user and system routes, verify user endpoint still works, verify system endpoint uses correct credentials/model. Verify userland cannot reach system inference.Created by spike investigation. Supersedes #203. Use
build-from-issueto plan and implement.