NemoClaw: Local Ollama inference routing fails from sandbox
Environment: DGX Spark (GB10, aarch64), Ubuntu 24.04, OpenShell 0.0.10, NemoClaw 0.0.x, OpenClaw 2026.3.11, Ollama 0.18.0 (snap), Docker 28.x w/ nvidia runtime, cgroup v2 host mode.
Goal: Route sandbox inference to local Ollama (nemotron-3-super:120b) on host port 11434 instead of NVIDIA NIM cloud.
Confirmed working: NVIDIA NIM cloud inference via nvidia provider type works end-to-end. curl from sandbox to https://inference.local/v1/chat/completions returns valid completions when provider is nvidia-nim. Ollama OpenAI-compat endpoint works from host: curl http://localhost:11434/v1/chat/completions returns valid JSON.
Attempt 1: openai provider type with baseUrl config
openshell provider create --name ollama-local --type openai \
--credential OPENAI_API_KEY=ollama --config baseUrl=http://localhost:11434/v1
openshell inference set --provider ollama-local --model nemotron-3-super:120b
Result: Verification step hits https://api.openai.com/v1 instead of configured baseUrl. Returns 401 from OpenAI. The baseUrl config key is stored (openshell provider get shows Config keys: baseUrl) but ignored during both verification and runtime routing. Using --no-verify bypasses verification but sandbox curl to inference.local still returns OpenAI 401 — confirming gateway hardcodes api.openai.com for openai type.
Attempt 2: nvidia provider type with baseUrl config
openshell provider create --name ollama-local --type nvidia \
--credential NVIDIA_API_KEY=ollama --config baseUrl=http://localhost:11434/v1
openshell inference set --provider ollama-local --model nemotron-3-super:120b --no-verify
Result: Inference route accepted (version incremented). Sandbox curl to https://inference.local/v1/chat/completions returns 404 Unknown. Verbose curl shows TLS proxy connect to 10.200.0.1:3128 succeeds (HTTP 200 Connection Established) then backend returns 404. Gateway is reaching something but not Ollama. Tried base_url (underscore) and endpoint config keys — same 404.
Attempt 3: generic provider type
openshell provider create --name ollama-local --type generic \
--config baseUrl=http://localhost:11434/v1 --credential API_KEY=ollama
Result: Provider created successfully. But openshell inference set rejects it: provider 'ollama-local' has unsupported type 'generic' for cluster inference (supported: openai, anthropic, nvidia).
Attempt 4: Direct sandbox→host via network policy
Added ollama_local network policy entry to sandbox policy YAML:
ollama_local:
name: ollama_local
endpoints:
- host: 172.17.0.1
port: 11434
protocol: rest
enforcement: enforce
rules:
- allow: { method: "*", path: "/**" }
binaries:
- { path: /usr/local/bin/openclaw }
- { path: /usr/bin/curl }
Applied via openshell policy set --policy ~/ollama-policy.yaml ultra-ops --wait — policy v2 accepted and loaded.
Result (through proxy): Sandbox env has http_proxy=http://10.200.0.1:3128. All HTTP goes through OpenShell gateway proxy. Proxy returns HTTP/1.1 403 Forbidden for http://172.17.0.1:11434. Policy allows the endpoint but proxy blocks it — likely because it's plain HTTP (no TLS) and/or the proxy has an independent allowlist.
Result (bypassing proxy): curl --noproxy "*" http://172.17.0.1:11434 returns Connection refused. Sandbox network namespace cannot reach Docker bridge IP directly — only through the proxy. Confirmed sandbox egress IP is 10.200.0.2, route via 10.200.0.1 (gateway proxy).
Interesting: curl -s http://172.17.0.1:11434 (through proxy, no path) returns empty 200 — the "Ollama is running" response. So the proxy can reach Ollama for simple GET. But POST to /v1/chat/completions gets 403.
Root cause assessment
-
Provider types (openai, nvidia) hardcode upstream base URLs. The baseUrl/base_url/endpoint config keys are stored but not used for inference routing. The gateway always routes to the canonical endpoint for the provider type.
-
generic provider type exists but is excluded from the inference routing system (supported: openai, anthropic, nvidia).
-
Sandbox HTTP proxy (10.200.0.1:3128) blocks non-TLS POST requests to internal hosts even when network policy explicitly allows the endpoint. GET succeeds, POST returns 403.
-
Net result: no path exists in OpenShell 0.0.10 to route inference.local to a local Ollama instance.
Feature request
Support a provider type (or extend generic) that:
- Accepts a user-defined
baseUrl for inference routing
- Routes
inference.local proxy traffic to that URL
- Works with plain HTTP endpoints on the host (Ollama, vLLM, llama.cpp server)
This is the primary use case for DGX Spark users running NemoClaw with local models.
Workaround
Use nvidia-nim provider with NVIDIA cloud inference. Works but burns finite API credits and adds 200-500ms latency vs local.
NemoClaw: Local Ollama inference routing fails from sandbox
Environment: DGX Spark (GB10, aarch64), Ubuntu 24.04, OpenShell 0.0.10, NemoClaw 0.0.x, OpenClaw 2026.3.11, Ollama 0.18.0 (snap), Docker 28.x w/ nvidia runtime, cgroup v2 host mode.
Goal: Route sandbox inference to local Ollama (nemotron-3-super:120b) on host port 11434 instead of NVIDIA NIM cloud.
Confirmed working: NVIDIA NIM cloud inference via
nvidiaprovider type works end-to-end.curlfrom sandbox tohttps://inference.local/v1/chat/completionsreturns valid completions when provider isnvidia-nim. Ollama OpenAI-compat endpoint works from host:curl http://localhost:11434/v1/chat/completionsreturns valid JSON.Attempt 1:
openaiprovider type with baseUrl configResult: Verification step hits
https://api.openai.com/v1instead of configured baseUrl. Returns 401 from OpenAI. ThebaseUrlconfig key is stored (openshell provider getshowsConfig keys: baseUrl) but ignored during both verification and runtime routing. Using--no-verifybypasses verification but sandboxcurltoinference.localstill returns OpenAI 401 — confirming gateway hardcodesapi.openai.comforopenaitype.Attempt 2:
nvidiaprovider type with baseUrl configResult: Inference route accepted (version incremented). Sandbox
curltohttps://inference.local/v1/chat/completionsreturns404 Unknown. Verbose curl shows TLS proxy connect to10.200.0.1:3128succeeds (HTTP 200 Connection Established) then backend returns 404. Gateway is reaching something but not Ollama. Triedbase_url(underscore) andendpointconfig keys — same 404.Attempt 3:
genericprovider typeResult: Provider created successfully. But
openshell inference setrejects it:provider 'ollama-local' has unsupported type 'generic' for cluster inference (supported: openai, anthropic, nvidia).Attempt 4: Direct sandbox→host via network policy
Added
ollama_localnetwork policy entry to sandbox policy YAML:Applied via
openshell policy set --policy ~/ollama-policy.yaml ultra-ops --wait— policy v2 accepted and loaded.Result (through proxy): Sandbox env has
http_proxy=http://10.200.0.1:3128. All HTTP goes through OpenShell gateway proxy. Proxy returnsHTTP/1.1 403 Forbiddenforhttp://172.17.0.1:11434. Policy allows the endpoint but proxy blocks it — likely because it's plain HTTP (no TLS) and/or the proxy has an independent allowlist.Result (bypassing proxy):
curl --noproxy "*" http://172.17.0.1:11434returnsConnection refused. Sandbox network namespace cannot reach Docker bridge IP directly — only through the proxy. Confirmed sandbox egress IP is10.200.0.2, route via10.200.0.1(gateway proxy).Interesting:
curl -s http://172.17.0.1:11434(through proxy, no path) returns empty 200 — the "Ollama is running" response. So the proxy can reach Ollama for simple GET. But POST to/v1/chat/completionsgets 403.Root cause assessment
Provider types (
openai,nvidia) hardcode upstream base URLs. ThebaseUrl/base_url/endpointconfig keys are stored but not used for inference routing. The gateway always routes to the canonical endpoint for the provider type.genericprovider type exists but is excluded from the inference routing system (supported: openai, anthropic, nvidia).Sandbox HTTP proxy (
10.200.0.1:3128) blocks non-TLS POST requests to internal hosts even when network policy explicitly allows the endpoint. GET succeeds, POST returns 403.Net result: no path exists in OpenShell 0.0.10 to route
inference.localto a local Ollama instance.Feature request
Support a provider type (or extend
generic) that:baseUrlfor inference routinginference.localproxy traffic to that URLThis is the primary use case for DGX Spark users running NemoClaw with local models.
Workaround
Use
nvidia-nimprovider with NVIDIA cloud inference. Works but burns finite API credits and adds 200-500ms latency vs local.