Skip to content

Local Ollama inference routing fails from sandbox #385

@ParkerFairfield

Description

@ParkerFairfield

NemoClaw: Local Ollama inference routing fails from sandbox

Environment: DGX Spark (GB10, aarch64), Ubuntu 24.04, OpenShell 0.0.10, NemoClaw 0.0.x, OpenClaw 2026.3.11, Ollama 0.18.0 (snap), Docker 28.x w/ nvidia runtime, cgroup v2 host mode.

Goal: Route sandbox inference to local Ollama (nemotron-3-super:120b) on host port 11434 instead of NVIDIA NIM cloud.

Confirmed working: NVIDIA NIM cloud inference via nvidia provider type works end-to-end. curl from sandbox to https://inference.local/v1/chat/completions returns valid completions when provider is nvidia-nim. Ollama OpenAI-compat endpoint works from host: curl http://localhost:11434/v1/chat/completions returns valid JSON.


Attempt 1: openai provider type with baseUrl config

openshell provider create --name ollama-local --type openai \
  --credential OPENAI_API_KEY=ollama --config baseUrl=http://localhost:11434/v1
openshell inference set --provider ollama-local --model nemotron-3-super:120b

Result: Verification step hits https://api.openai.com/v1 instead of configured baseUrl. Returns 401 from OpenAI. The baseUrl config key is stored (openshell provider get shows Config keys: baseUrl) but ignored during both verification and runtime routing. Using --no-verify bypasses verification but sandbox curl to inference.local still returns OpenAI 401 — confirming gateway hardcodes api.openai.com for openai type.

Attempt 2: nvidia provider type with baseUrl config

openshell provider create --name ollama-local --type nvidia \
  --credential NVIDIA_API_KEY=ollama --config baseUrl=http://localhost:11434/v1
openshell inference set --provider ollama-local --model nemotron-3-super:120b --no-verify

Result: Inference route accepted (version incremented). Sandbox curl to https://inference.local/v1/chat/completions returns 404 Unknown. Verbose curl shows TLS proxy connect to 10.200.0.1:3128 succeeds (HTTP 200 Connection Established) then backend returns 404. Gateway is reaching something but not Ollama. Tried base_url (underscore) and endpoint config keys — same 404.

Attempt 3: generic provider type

openshell provider create --name ollama-local --type generic \
  --config baseUrl=http://localhost:11434/v1 --credential API_KEY=ollama

Result: Provider created successfully. But openshell inference set rejects it: provider 'ollama-local' has unsupported type 'generic' for cluster inference (supported: openai, anthropic, nvidia).

Attempt 4: Direct sandbox→host via network policy

Added ollama_local network policy entry to sandbox policy YAML:

ollama_local:
  name: ollama_local
  endpoints:
    - host: 172.17.0.1
      port: 11434
      protocol: rest
      enforcement: enforce
      rules:
        - allow: { method: "*", path: "/**" }
  binaries:
    - { path: /usr/local/bin/openclaw }
    - { path: /usr/bin/curl }

Applied via openshell policy set --policy ~/ollama-policy.yaml ultra-ops --wait — policy v2 accepted and loaded.

Result (through proxy): Sandbox env has http_proxy=http://10.200.0.1:3128. All HTTP goes through OpenShell gateway proxy. Proxy returns HTTP/1.1 403 Forbidden for http://172.17.0.1:11434. Policy allows the endpoint but proxy blocks it — likely because it's plain HTTP (no TLS) and/or the proxy has an independent allowlist.

Result (bypassing proxy): curl --noproxy "*" http://172.17.0.1:11434 returns Connection refused. Sandbox network namespace cannot reach Docker bridge IP directly — only through the proxy. Confirmed sandbox egress IP is 10.200.0.2, route via 10.200.0.1 (gateway proxy).

Interesting: curl -s http://172.17.0.1:11434 (through proxy, no path) returns empty 200 — the "Ollama is running" response. So the proxy can reach Ollama for simple GET. But POST to /v1/chat/completions gets 403.


Root cause assessment

  1. Provider types (openai, nvidia) hardcode upstream base URLs. The baseUrl/base_url/endpoint config keys are stored but not used for inference routing. The gateway always routes to the canonical endpoint for the provider type.

  2. generic provider type exists but is excluded from the inference routing system (supported: openai, anthropic, nvidia).

  3. Sandbox HTTP proxy (10.200.0.1:3128) blocks non-TLS POST requests to internal hosts even when network policy explicitly allows the endpoint. GET succeeds, POST returns 403.

  4. Net result: no path exists in OpenShell 0.0.10 to route inference.local to a local Ollama instance.

Feature request

Support a provider type (or extend generic) that:

  • Accepts a user-defined baseUrl for inference routing
  • Routes inference.local proxy traffic to that URL
  • Works with plain HTTP endpoints on the host (Ollama, vLLM, llama.cpp server)

This is the primary use case for DGX Spark users running NemoClaw with local models.

Workaround

Use nvidia-nim provider with NVIDIA cloud inference. Works but burns finite API credits and adds 200-500ms latency vs local.

Metadata

Metadata

Assignees

Labels

Local ModelsRunning NemoClaw with local modelsPlatform: DGX SparkSupport for DGX SparkbugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions