Agent Diagnostic
empty
Description
What happened? What did you expect to happen?
- On an Apple Silicon Mac, create an Ubuntu 22.04.5 ARM64 VM in UTM/QEMU.
- Install Docker in the VM and verify Docker is running.
- Install OpenShell via the NemoClaw onboarding flow, or run
openshell gateway start directly.
- Wait for gateway initialization to begin.
- Observe that gateway startup fails with:
K8s namespace not ready
timed out waiting for namespace 'openshell' to exist
- While
openshell gateway start is still running, inspect the embedded k3s cluster inside the gateway container:
kubectl get ns shows only the default namespaces and agent-sandbox-system, but not openshell
kubectl get pods -A shows core pods stuck in ContainerCreating
- Describe the stuck pods and observe repeated sandbox/network errors such as:
FailedCreatePodSandBox
plugin type="flannel" failed (add)
failed to load flannel 'subnet.env' file: open /run/flannel/subnet.env: no such file or directory
- After the timeout, the gateway is torn down and
openshell status reports:
Status: No gateway configured.
Reproduction Steps
I expected openshell gateway start to initialize the gateway successfully and create a working OpenShell environment, including the openshell Kubernetes namespace.
Instead, gateway startup consistently fails during initialization with:
K8s namespace not ready
timed out waiting for namespace 'openshell' to exist
While the gateway container is briefly running, the embedded k3s cluster only creates the default namespaces plus agent-sandbox-system, but never creates the openshell namespace. Core pods remain stuck in ContainerCreating, and pod inspection shows repeated sandbox/network failures caused by flannel not being able to load /run/flannel/subnet.env.
After the timeout, the gateway container is torn down and openshell status reports No gateway configured.
In short: I expected a successful gateway bootstrap, but instead the embedded k3s/flannel networking appears to fail during startup, which prevents the openshell namespace and related services from ever becoming ready.
Environment
Host:
- Apple Silicon Mac
- UTM QEMU VM
Guest OS:
- Ubuntu 22.04.5 LTS
- Kernel: 5.15.0-173-generic
- Architecture: aarch64 / ARM64
Docker:
- Docker Engine / Server Version: 28.2.2
OpenShell:
- openshell CLI: 0.0.19
- Gateway image: ghcr.io/nvidia/openshell/cluster:0.0.19
VM resources:
- 8 CPUs
- 15.59 GiB RAM
- 4 GiB swap
- ~84 GiB free disk during testing
Networking / runtime notes:
- Docker runtime was working and accessible to the non-root user after adding the user to the docker group
- The OpenShell gateway container briefly appeared as
openshell-cluster-openshell during startup, then disappeared after the timeout
Additional context:
- I also previously tried the macOS host path with Docker Desktop and Colima
Logs
Gateway startup repeatedly fails with:
openshell gateway start
✓ Checking Docker
✓ Downloading gateway
x Initializing environment x Gateway failed: openshell
Gateway failed to start
Error: × K8s namespace not ready
╰─▶ timed out waiting for namespace 'openshell' to exist: Error from server
(NotFound): namespaces "openshell" not found
Representative container log lines:
time="2026-03-31T20:39:35Z" level=info msg="Connecting to proxy" url="wss://172.18.0.2:6443/v1-k3s/connect"
time="2026-03-31T20:39:35Z" level=info msg="error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF"
E0331 20:40:15.722444 117 handler_proxy.go:143] error resolving kube-system/metrics-server: no endpoints available for service "metrics-server"
E0331 20:40:30.975563 117 controller.go:102] "Unhandled Error" err=<loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable>
Live pod inspection while the gateway container was still running showed pods stuck in ContainerCreating. The most useful pod-level errors were:
coredns:
Warning FailedCreatePodSandBox kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "...": plugin type="flannel" failed (add): failed to load flannel 'subnet.env' file: open /run/flannel/subnet.env: no such file or directory. Check the flannel pod log for this node.
helm-install-openshell:
Warning FailedCreatePodSandBox kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "...": plugin type="flannel" failed (add): failed to load flannel 'subnet.env' file: open /run/flannel/subnet.env: no such file or directory. Check the flannel pod log for this node.
local-path-provisioner:
Warning FailedMount kubelet MountVolume.SetUp failed for volume "config-volume" : configmap "local-path-config" not found
Warning FailedCreatePodSandBox kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "...": plugin type="flannel" failed (add): failed to load flannel 'subnet.env' file: open /run/flannel/subnet.env: no such file or directory. Check the flannel pod log for this node.
metrics-server:
Warning FailedCreatePodSandBox kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "...": plugin type="flannel" failed (add): failed to load flannel 'subnet.env' file: open /run/flannel/subnet.env: no such file or directory. Check the flannel pod log for this node.
agent-sandbox-controller:
Warning FailedCreatePodSandBox kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "...": plugin type="flannel" failed (add): failed to load flannel 'subnet.env' file: open /run/flannel/subnet.env: no such file or directory. Check the flannel pod log for this node.
Additional notes:
- The gateway container briefly exists, then disappears after the timeout.
- After the timeout, `openshell status` reports: `Status: No gateway configured.`
- When trying to inspect events after teardown, Docker returns:
`Error response from daemon: No such container: openshell-cluster-openshell`
Agent-First Checklist
Agent Diagnostic
empty
Description
What happened? What did you expect to happen?
openshell gateway startdirectly.K8s namespace not readytimed out waiting for namespace 'openshell' to existopenshell gateway startis still running, inspect the embedded k3s cluster inside the gateway container:kubectl get nsshows only the default namespaces andagent-sandbox-system, but notopenshellkubectl get pods -Ashows core pods stuck inContainerCreatingFailedCreatePodSandBoxplugin type="flannel" failed (add)failed to load flannel 'subnet.env' file: open /run/flannel/subnet.env: no such file or directoryopenshell statusreports:Status: No gateway configured.Reproduction Steps
I expected
openshell gateway startto initialize the gateway successfully and create a working OpenShell environment, including theopenshellKubernetes namespace.Instead, gateway startup consistently fails during initialization with:
K8s namespace not readytimed out waiting for namespace 'openshell' to existWhile the gateway container is briefly running, the embedded k3s cluster only creates the default namespaces plus
agent-sandbox-system, but never creates theopenshellnamespace. Core pods remain stuck inContainerCreating, and pod inspection shows repeated sandbox/network failures caused by flannel not being able to load/run/flannel/subnet.env.After the timeout, the gateway container is torn down and
openshell statusreportsNo gateway configured.In short: I expected a successful gateway bootstrap, but instead the embedded k3s/flannel networking appears to fail during startup, which prevents the
openshellnamespace and related services from ever becoming ready.Environment
Host:
Guest OS:
Docker:
OpenShell:
VM resources:
Networking / runtime notes:
openshell-cluster-openshellduring startup, then disappeared after the timeoutAdditional context:
Logs
Agent-First Checklist
debug-openshell-cluster,debug-inference,openshell-cli)