Agent Diagnostic
- Loaded
debug-openshell-cluster skill from .agents/skills/
- Ran
openshell status → "tls handshake eof" (server not running)
- Ran
openshell doctor logs --lines 50 → orphaned cgroup cleanup, no errors
- Ran
openshell doctor exec -- kubectl get namespaces → openshell namespace exists
- Ran
openshell doctor exec -- kubectl -n openshell get secrets → TLS secrets
missing
- Ran
openshell doctor exec -- kubectl -n openshell describe pod openshell-0:
- FailedMount: secret "openshell-server-tls" not found
- FailedMount: secret "openshell-server-client-ca" not found
- Root cause: CLI timed out at
wait_for_namespace() before reaching
reconcile_pki() step
- K3s was still initializing (~2 min on first run with image pulls)
- Workaround: manually generated PKI with openssl, applied secrets, ran
openshell gateway add --local
Description
On first run, openshell gateway start times out waiting for the openshell namespace
(~120s) while K3s is still initializing. The CLI exits before reaching the PKI
generation step in reconcile_pki().
The container keeps running and K3s eventually creates the namespace, but without
TLS secrets the openshell-0 pod is stuck in ContainerCreating with FailedMount
errors. The gateway never becomes healthy.
Expected: Gateway starts successfully with TLS secrets created.
Actual: Timeout at namespace wait, PKI step skipped, cluster left in broken state.
Reproduction Steps
- Fresh install:
curl -LsSf https://raw.githubusercontent.com/NVIDIA/OpenShell/main/install.sh | sh
- Run
openshell gateway start
- Observe timeout error: "K8s namespace not ready"
- Container keeps running but
openshell status shows "tls handshake eof"
openshell doctor exec -- kubectl -n openshell get secrets shows no TLS secrets
Environment
- OS: macOS (Apple Silicon, Darwin 25.3.0)
- Docker: Colima + Docker Engine 27.4.0 (4 CPUs, 8GB RAM)
- OpenShell: 0.0.11-dev.2+g1d071b8d9
Logs
Deploying local gateway openshell...
Checking Docker
Downloading gateway
Initializing environment
Error: × K8s namespace not ready
╰─▶ timed out waiting for namespace 'openshell' to exist: Error from server
(NotFound):
namespaces "openshell" not found
# After timeout, pod status:
$ openshell doctor exec -- kubectl -n openshell describe pod openshell-0
Events:
Warning FailedMount kubelet MountVolume.SetUp failed for volume "tls-client-ca"
: secret "openshell-server-client-ca" not found
Warning FailedMount kubelet MountVolume.SetUp failed for volume "tls-cert" :
secret "openshell-server-tls" not found
Agent-First Checklist
Agent Diagnostic
debug-openshell-clusterskill from .agents/skills/openshell status→ "tls handshake eof" (server not running)openshell doctor logs --lines 50→ orphaned cgroup cleanup, no errorsopenshell doctor exec -- kubectl get namespaces→ openshell namespace existsopenshell doctor exec -- kubectl -n openshell get secrets→ TLS secretsmissing
openshell doctor exec -- kubectl -n openshell describe pod openshell-0:wait_for_namespace()before reachingreconcile_pki()stepopenshell gateway add --localDescription
On first run, openshell gateway start times out waiting for the openshell namespace
(~120s) while K3s is still initializing. The CLI exits before reaching the PKI
generation step in reconcile_pki().
The container keeps running and K3s eventually creates the namespace, but without
TLS secrets the openshell-0 pod is stuck in ContainerCreating with FailedMount
errors. The gateway never becomes healthy.
Expected: Gateway starts successfully with TLS secrets created.
Actual: Timeout at namespace wait, PKI step skipped, cluster left in broken state.
Reproduction Steps
curl -LsSf https://raw.githubusercontent.com/NVIDIA/OpenShell/main/install.sh | shopenshell gateway startopenshell statusshows "tls handshake eof"openshell doctor exec -- kubectl -n openshell get secretsshows no TLS secretsEnvironment
Logs
Agent-First Checklist
debug-openshell-cluster,debug-inference,openshell-cli)