You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Simplify the inference routing system by removing the implicit catch-all mechanism and replacing it with an explicit inference.local hostname addressable inside every sandbox. Inference configuration moves from per-route CRUD to cluster-level config backed by the existing provider system.
Context
The current inference routing has two paths:
Direct allow — Network policy explicitly allows traffic to a specific endpoint (e.g., api.anthropic.com). Works for any endpoint, not inference-specific.
Implicit catch-all — Requests that aren't directly allowed but are detected as inference calls get silently routed through the privacy router to a configured backend.
The catch-all is confusing. A typo in a policy (e.g., api.entropics.com instead of api.anthropic.com) silently reroutes inference to the local model instead of failing visibly. As John put it: "explicit policies for allowances and then we have this implicit secret inference catch-all which breaks the mental model."
Decisions
Remove the implicit catch-all — No more inspect_for_inference OPA action. If a request isn't explicitly allowed, it's denied.
Introduce inference.local — An always-addressable hostname inside every sandbox that routes through our inference router. No credentials needed from the agent's perspective.
inference.local defaults to managed NVIDIA inference — If no local model is deployed (e.g., on Brev/CPU), the router points to managed NVIDIA endpoints. When a local model is available, it switches over.
Direct allow unchanged — Explicit network policy allows (e.g., Claude → Anthropic) continue as-is. The router is for "your custom agent" inference.
Single model override — Router rewrites the model name from client to whatever is configured. Client-specified model is ignored.
Cluster-level inference config — How inference.local routes is configured at the cluster level, not per-sandbox. Config is: provider name + model name.
# 1. Create a provider with credentials
nemoclaw provider create --name nvidia_build --type nvidia --from-existing
# 2. Configure cluster-level inference
nemoclaw cluster inference set --provider nvidia_build --model llama-3.1-8b
# 3. Inside any sandbox, agent hits inference.local — just works
curl http://inference.local/v1/chat/completions \
-d '{"model": "anything", "messages": [...]}'# model is overwritten to llama-3.1-8b, routed to nvidia_build with injected API key
Summary
Simplify the inference routing system by removing the implicit catch-all mechanism and replacing it with an explicit
inference.localhostname addressable inside every sandbox. Inference configuration moves from per-route CRUD to cluster-level config backed by the existing provider system.Context
The current inference routing has two paths:
api.anthropic.com). Works for any endpoint, not inference-specific.The catch-all is confusing. A typo in a policy (e.g.,
api.entropics.cominstead ofapi.anthropic.com) silently reroutes inference to the local model instead of failing visibly. As John put it: "explicit policies for allowances and then we have this implicit secret inference catch-all which breaks the mental model."Decisions
inspect_for_inferenceOPA action. If a request isn't explicitly allowed, it's denied.inference.local— An always-addressable hostname inside every sandbox that routes through our inference router. No credentials needed from the agent's perspective.inference.localdefaults to managed NVIDIA inference — If no local model is deployed (e.g., on Brev/CPU), the router points to managed NVIDIA endpoints. When a local model is available, it switches over.inference.localroutes is configured at the cluster level, not per-sandbox. Config is: provider name + model name.openai,anthropic,nvidia(all API-key-only for now). Related: Inference route API keys stored in plain object store #21, Inference route API keys exposed via ListInferenceRoutes #20.Router Flow
inference.localUser Flow
Implementation
Remove
nemoclaw inference create/update/delete/listCLI commandsCreateInferenceRoute,UpdateInferenceRoute,DeleteInferenceRoute,ListInferenceRoutes,GetInferenceRoute)InferenceRoute/InferenceRouteSpecdata model from proto (or deprecate)inspect_for_inferenceOPA action and the implicit catch-all code path in the sandbox proxyrouting_hintconcept and route-level API key storageAdd
inference.localDNS/hostname resolution inside the sandbox (resolve to the router)nemoclaw cluster inference set/getCLI commands (provider name + model name)openai,anthropic, andnvidiaprovidersnvidiaone already exists.inference.localUpdate
dev-sandbox-policy.yaml,policy-local.yaml,policy-frontier.yaml, etc.) to remove inference catch-all rulesarchitecture/security-policy.md, etc.)GTC Priorities
Related Issues