Skip to content

fix: detect Docker Desktop socket on macOS#224

Closed
brianwtaylor wants to merge 4 commits intoNVIDIA:mainfrom
brianwtaylor:fix/docker-desktop-socket
Closed

fix: detect Docker Desktop socket on macOS#224
brianwtaylor wants to merge 4 commits intoNVIDIA:mainfrom
brianwtaylor:fix/docker-desktop-socket

Conversation

@brianwtaylor
Copy link
Copy Markdown
Contributor

@brianwtaylor brianwtaylor commented Mar 17, 2026

Summary

  • Socket detection only checks Colima and Podman paths — Docker Desktop on macOS uses ~/.docker/run/docker.sock (DOCKER_HOST not set Error while installation. #137)
  • Add Docker Desktop socket to candidates array, ordered after Colima (preferred) but before Podman
  • Add test cases verifying detection and priority order

Depends on #144

Fixes #137

Test plan

Automated Tests

npm test

New detectContainerSocket tests (3 added, 9 total with #144):

  1. Detects Docker Desktop socket when present
  2. Prefers Colima over Docker Desktop when both exist
  3. Prefers Docker Desktop over Podman when both exist

Hardware Validation

Validated against real macOS socket layout:

Machine OS Docker Desktop Colima Detection result
kobe (Mac mini M2) macOS 26.2 Not installed Running (Docker 29.2) Colima socket detected at ~/.colima/default/docker.sock — correct priority

Socket connectivity verified: DOCKER_HOST=unix:///Users/kobe/.colima/default/docker.sock docker ps successfully listed 4 running containers.

Priority order confirmed on real hardware:

  • Colima legacy (~/.colima/default/docker.sock) — present, selected
  • Docker Desktop (~/.docker/run/docker.sock) — not present, would be checked next
  • Podman — not present, lowest priority

Docker Desktop socket path (~/.docker/run/docker.sock) matches the official Docker Desktop documentation for macOS. The priority order ensures existing Colima users are unaffected by this change.

Summary by CodeRabbit

  • New Features

    • Interactive onboarding and CLI commands for sandbox lifecycle, deployment, status, logs, and policy management.
    • Automatic container runtime detection with Podman fallback and expanded socket detection across Colima, Docker Desktop, and rootless setups.
    • Automated setup flows: openshell installation, gateway/sandbox management, inference backend selection, and model/registry handling.
    • Telegram bridge to run agents in sandboxes and relay messages.
  • Tests

    • Added comprehensive tests for container socket detection.

Extract detectContainerSocket() with dependency injection and add
Podman socket candidates (machine, rootless, QEMU). Colima sockets
are checked first to preserve existing behavior. Add Podman fallback
to isDockerRunning() with host-gateway caveat warning.

Closes #116

Signed-off-by: Brian Taylor <brian@briantaylor.xyz>
Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
@brianwtaylor brianwtaylor force-pushed the fix/docker-desktop-socket branch from 702d05c to 3441151 Compare March 18, 2026 00:07
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

Adds a full interactive onboarding flow, broadens container socket detection (Docker, Colima, Podman) with tests, introduces gateway/sandbox/NIM/OpenClaw setup logic, extends CLI commands, and adds a Telegram sandbox bridge and related helpers.

Changes

Cohort / File(s) Summary
Runtime detection & tests
bin/lib/runner.js, test/runner.test.js
New detectContainerSocket(opts) exported; centralized detection across Colima, Docker Desktop, Podman (including rootless/XDG paths) with DI hooks; comprehensive tests covering detection precedence and edge paths.
Onboarding & sandbox orchestration
bin/lib/onboard.js
Large new onboarding module: container runtime detection integration, openshell detection/install, startGateway, createSandbox, setupNim, setupInference, setupOpenclaw, setupPolicies, printDashboard, and exported onboard() to run the 7-step interactive workflow.
CLI surface
bin/nemoclaw.js
Adds many new CLI commands and handlers (onboard, setup, deploy, start/stop/status, sandbox management, policy management, logs, help) and richer sandbox/status output and flows.
Telegram bridge
scripts/telegram-bridge.js
New Telegram helper: API wrapper, chunked sendMessage with retry, sendTyping, long-poll loop, runAgentInSandbox to SSH and execute OpenClaw agent, and main poll startup.
Minor tests/docs
test/preflight.test.js
Small JSDoc comment addition for writeTempDaemon; no runtime or API changes.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant CLI as "CLI (nemoclaw)"
  participant Onb as "onboard.js"
  participant Runner as "runner.js (detectContainerSocket)"
  participant Gateway as "Gateway"
  participant Sandbox

  User->>CLI: run `onboard`
  CLI->>Onb: invoke onboard()
  Onb->>Runner: detectContainerSocket(opts)
  Runner-->>Onb: socket path / runtime or null
  Onb->>Gateway: startGateway(runtime)
  Gateway-->>Onb: health OK
  Onb->>Sandbox: createSandbox(name, buildCtx)
  Sandbox-->>Onb: sandbox ready
  Onb->>Sandbox: setupNim()/setupInference()/setupOpenclaw()
  Sandbox-->>Onb: services registered (openshell, NIM)
  Onb->>CLI: printDashboard(status)
  CLI-->>User: display onboarding summary
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through sockets, ports, and sand,
Launched gateways with a careful hand,
I nudged the NIM and waved hello,
OpenClaw danced and logs did flow,
A rabbit cheers — your clusters grow! 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR includes substantial changes beyond Docker Desktop socket detection: new onboarding workflow, Podman support, inference backend setup, gateway management, and a Telegram bridge, which appear unrelated to issue #137. Remove out-of-scope changes (onboard.js, nemoclaw.js additions, telegram-bridge.js) and focus this PR solely on Docker Desktop socket detection fixes in runner.js and related tests.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main focus: adding Docker Desktop socket detection on macOS, which directly addresses the primary objective of issue #137.
Linked Issues check ✅ Passed The PR implements the core requirement from #137: detecting Docker Desktop's macOS socket location (~/.docker/run/docker.sock) so tooling finds Docker when running with DOCKER_HOST unset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Important

Merge conflicts detected (Beta)

  • Resolve merge conflict in branch fix/docker-desktop-socket
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
bin/lib/runner.js (1)

12-13: Update the function docstring to include Docker Desktop in the priority order.

The implementation now checks Colima, then Docker Desktop, then Podman; the comment should match that.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/runner.js` around lines 12 - 13, Update the function docstring that
currently states the priority as "Colima first, then Podman" to reflect the
actual implementation order (Colima, then Docker Desktop, then Podman); locate
the comment above the detectContainerRuntimeSocket function (or the block
starting "Detect a container runtime socket") and edit the description to list
Colima, Docker Desktop, then Podman, and mention it returns the socket path or
null.
bin/lib/onboard.js (1)

30-34: Differentiate runtime type instead of returning a plain boolean.

This path returns true for Podman, but downstream messaging still states Docker is running. Consider returning "docker" | "podman" | null so preflight output and follow-up guidance stay accurate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 30 - 34, The Podman branch in
bin/lib/onboard.js currently returns a plain boolean, causing downstream
messaging to incorrectly report Docker; change the function’s return contract to
return "docker" | "podman" | null instead of true/false: when runCapture("podman
info", ...) succeeds return "podman", ensure the Docker detection branch returns
"docker", and return null when neither runtime is found; update any callers that
use this function (and the preflight/messaging logic) to branch on the returned
string so output and guidance correctly reference the detected runtime.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@bin/lib/onboard.js`:
- Around line 30-34: The Podman branch in bin/lib/onboard.js currently returns a
plain boolean, causing downstream messaging to incorrectly report Docker; change
the function’s return contract to return "docker" | "podman" | null instead of
true/false: when runCapture("podman info", ...) succeeds return "podman", ensure
the Docker detection branch returns "docker", and return null when neither
runtime is found; update any callers that use this function (and the
preflight/messaging logic) to branch on the returned string so output and
guidance correctly reference the detected runtime.

In `@bin/lib/runner.js`:
- Around line 12-13: Update the function docstring that currently states the
priority as "Colima first, then Podman" to reflect the actual implementation
order (Colima, then Docker Desktop, then Podman); locate the comment above the
detectContainerRuntimeSocket function (or the block starting "Detect a container
runtime socket") and edit the description to list Colima, Docker Desktop, then
Podman, and mention it returns the socket path or null.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 061ade7a-a5ff-4871-881e-c7ff5bf96fe2

📥 Commits

Reviewing files that changed from the base of the PR and between 2a9afbc and 3441151.

📒 Files selected for processing (3)
  • bin/lib/onboard.js
  • bin/lib/runner.js
  • test/runner.test.js

…iority docs

- Return "docker" | "podman" | null from detectContainerRuntime() so
  preflight messaging accurately reflects which runtime was found
- Fix detectContainerSocket docstring to list Docker Desktop in the
  priority order (Colima → Docker Desktop → Podman)
- Add JSDoc to run() and runCapture() for docstring coverage
Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
bin/lib/onboard.js (1)

478-493: ⚠️ Potential issue | 🟡 Minor

Step 6 is a no-op placeholder.

setupOpenclaw prints a success message but performs no actual setup. If OpenClaw gateway launch is deferred, consider documenting when it actually happens.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 478 - 493, The setupOpenclaw function is a
no-op that prints a misleading success message; update it to either perform the
actual OpenClaw gateway launch or clearly mark the step as deferred. Locate
setupOpenclaw and either implement the real launch flow (invoke the sandbox
connect/create commands or call the helper that runs the gateway, handle errors
and await completion, and keep the existing step(6, 7, ...) usage) or change the
console message to indicate the launch is deferred and add a TODO comment
documenting where/when the gateway is actually started (and avoid printing a
false success).
scripts/telegram-bridge.js (2)

33-33: ⚠️ Potential issue | 🟡 Minor

activeSessions map is declared but never populated.

The activeSessions Map is initialized and /reset deletes from it, but no code ever adds entries. This appears to be dead code or incomplete session tracking.

Consider either implementing session history tracking or removing this unused declaration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/telegram-bridge.js` at line 33, The Map activeSessions is declared
but never populated, so either implement session tracking where messages are
processed or remove the unused Map and its reset logic: if you want session
history, add entries to activeSessions keyed by chatId inside the main message
handler (e.g., where messages are received/handled — look for the
bot.on('message') or onText handler) and update/clear history in the '/reset'
command handler; otherwise remove the activeSessions declaration and any code
that deletes from it (the '/reset' branch) to eliminate dead code.

112-127: ⚠️ Potential issue | 🟠 Major

API key visible in process list; temp file not cleaned on spawn error.

  1. The NVIDIA_API_KEY is interpolated into the SSH command string, making it visible in ps output on both local and remote hosts.
  2. If spawn emits an error event (e.g., SSH binary not found), the temp config file at confPath is never deleted.
🔒 Suggested improvements

Pass the API key via SSH's SendEnv or write it to a temp file that's sourced remotely. For the cleanup issue:

     proc.on("error", (err) => {
+      try { require("fs").unlinkSync(confPath); } catch {}
       resolve(`Error: ${err.message}`);
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/telegram-bridge.js` around lines 112 - 127, In runAgentInSandbox
avoid embedding API_KEY in the remote command (so it doesn't appear in ps) and
ensure confPath is always removed on spawn errors: instead of interpolating
NVIDIA_API_KEY into cmd, pass the secret via a secure channel (e.g., configure
SSH to SendEnv and pass API_KEY in the child process env, or write the key to a
temporary file keyPath, securely scp it to the remote side and source it there)
and remove direct "'${API_KEY}'" interpolation in cmd; also attach handlers to
proc ('error', 'exit', 'close') and wrap spawn in try/catch/finally to always
unlink confPath (and keyPath if created) on error or completion so temp files
are cleaned even when ssh binary is missing.
🧹 Nitpick comments (1)
bin/lib/onboard.js (1)

179-187: Colima socket detection duplicates logic from runner.js.

This inline socket detection mirrors the Colima paths in detectContainerSocket. Consider reusing the centralized function to avoid divergence if socket paths change.

♻️ Optional: Reuse detectContainerSocket
+const { detectContainerSocket } = require("./runner");
+
 // CoreDNS fix — always run. k3s-inside-Docker has broken DNS on all platforms.
-const home = process.env.HOME || "/tmp";
-const colimaSocket = [
-  path.join(home, ".colima/default/docker.sock"),
-  path.join(home, ".config/colima/default/docker.sock"),
-].find((s) => fs.existsSync(s));
-if (colimaSocket) {
+const detectedSocket = detectContainerSocket();
+const isColima = detectedSocket && detectedSocket.includes(".colima");
+if (isColima) {
   console.log("  Patching CoreDNS for Colima...");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 179 - 187, This duplicates the Colima socket
path detection; instead call the centralized detectContainerSocket function
(import/require detectContainerSocket from the module that defines it) and use
its result to decide whether to run the CoreDNS patch. Replace the inline path
array and fs.existsSync logic with a call to detectContainerSocket(), then if
the returned socket path is truthy and indicates Colima (e.g., contains "colima"
or matches the Colima socket patterns), run the existing run(...) invocation
unchanged; keep the run call and the ignoreError option as-is so behavior
remains identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@bin/lib/onboard.js`:
- Around line 478-493: The setupOpenclaw function is a no-op that prints a
misleading success message; update it to either perform the actual OpenClaw
gateway launch or clearly mark the step as deferred. Locate setupOpenclaw and
either implement the real launch flow (invoke the sandbox connect/create
commands or call the helper that runs the gateway, handle errors and await
completion, and keep the existing step(6, 7, ...) usage) or change the console
message to indicate the launch is deferred and add a TODO comment documenting
where/when the gateway is actually started (and avoid printing a false success).

In `@scripts/telegram-bridge.js`:
- Line 33: The Map activeSessions is declared but never populated, so either
implement session tracking where messages are processed or remove the unused Map
and its reset logic: if you want session history, add entries to activeSessions
keyed by chatId inside the main message handler (e.g., where messages are
received/handled — look for the bot.on('message') or onText handler) and
update/clear history in the '/reset' command handler; otherwise remove the
activeSessions declaration and any code that deletes from it (the '/reset'
branch) to eliminate dead code.
- Around line 112-127: In runAgentInSandbox avoid embedding API_KEY in the
remote command (so it doesn't appear in ps) and ensure confPath is always
removed on spawn errors: instead of interpolating NVIDIA_API_KEY into cmd, pass
the secret via a secure channel (e.g., configure SSH to SendEnv and pass API_KEY
in the child process env, or write the key to a temporary file keyPath, securely
scp it to the remote side and source it there) and remove direct "'${API_KEY}'"
interpolation in cmd; also attach handlers to proc ('error', 'exit', 'close')
and wrap spawn in try/catch/finally to always unlink confPath (and keyPath if
created) on error or completion so temp files are cleaned even when ssh binary
is missing.

---

Nitpick comments:
In `@bin/lib/onboard.js`:
- Around line 179-187: This duplicates the Colima socket path detection; instead
call the centralized detectContainerSocket function (import/require
detectContainerSocket from the module that defines it) and use its result to
decide whether to run the CoreDNS patch. Replace the inline path array and
fs.existsSync logic with a call to detectContainerSocket(), then if the returned
socket path is truthy and indicates Colima (e.g., contains "colima" or matches
the Colima socket patterns), run the existing run(...) invocation unchanged;
keep the run call and the ignoreError option as-is so behavior remains
identical.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f89ab487-f0dc-4b37-b757-a72529eeac72

📥 Commits

Reviewing files that changed from the base of the PR and between 3441151 and 9ca7413.

📒 Files selected for processing (5)
  • bin/lib/onboard.js
  • bin/lib/runner.js
  • bin/nemoclaw.js
  • scripts/telegram-bridge.js
  • test/preflight.test.js
✅ Files skipped from review due to trivial changes (1)
  • test/preflight.test.js

@brianwtaylor
Copy link
Copy Markdown
Contributor Author

Closing — #286 supersedes this with centralized macOS Docker socket detection via platform.js.

mafueee pushed a commit to mafueee/NemoClaw that referenced this pull request Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DOCKER_HOST not set Error while installation.

2 participants