Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds gateway health-check and reuse logic to onboarding, implements sandbox↔gateway reconciliation using OpenShell (capture, recover, select/start), updates CLI sandbox commands to be gateway-aware, and expands unit, integration, and e2e tests to verify reuse, recovery, and stale-registry handling. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Onboard
participant OpenShell
participant Gateway
User->>Onboard: run `nemoclaw onboard`
Onboard->>OpenShell: capture `openshell status`
OpenShell-->>Onboard: status output
Onboard->>OpenShell: capture gateway info
OpenShell-->>Onboard: gateway metadata
alt gateway healthy & connected
Onboard->>Gateway: select existing gateway (set OPENSHELL_GATEWAY)
Gateway-->>Onboard: selected
Onboard-->>User: reuse gateway, skip destroy/start
else gateway stale or unhealthy
Onboard->>OpenShell: destroy stale gateway (if stale)
OpenShell-->>Onboard: destroyed
Onboard->>OpenShell: start gateway
OpenShell-->>Onboard: started
Onboard->>OpenShell: verify gateway health
OpenShell-->>Onboard: health result
Onboard-->>User: complete (or fail)
end
sequenceDiagram
actor User
participant CLI as NemoClaw CLI
participant Reconciler
participant OpenShell
participant Registry
User->>CLI: `sandbox status` / `sandbox connect`
CLI->>Reconciler: ensureLiveSandboxOrExit(sandbox)
Reconciler->>OpenShell: `openshell sandbox get` / `gateway` queries
alt sandbox present & gateway healthy
OpenShell-->>Reconciler: sandbox + gateway ok
Reconciler-->>CLI: healthy state (may select gateway)
else gateway transport error or identity drift
OpenShell-->>Reconciler: transport/identity error
Reconciler->>OpenShell: attempt select/start recovery (startGatewayForRecovery)
alt recovery succeeds
OpenShell-->>Reconciler: sandbox available after recovery
Reconciler-->>CLI: recovered state (message)
else recovery fails
Reconciler-->>CLI: failure with guidance
end
else sandbox missing (NotFound)
OpenShell-->>Reconciler: NotFound
Reconciler->>Registry: remove stale `sandboxes.json` entry
Registry-->>Reconciler: removed
Reconciler-->>CLI: missing state (message)
end
CLI-->>User: status/connect result and messaging
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
bin/nemoclaw.js (1)
115-137: Consider delegating to onboard'sstartGatewayto avoid divergence.
recoverNamedGatewayRuntimere-implements gateway startup logic (gateway start --name nemoclaw) without the health-check retry loop, version pinning, or CoreDNS patching present inbin/lib/onboard.js:startGateway. IfstartGatewayis updated (e.g., new startup flags, different health verification), this function won't inherit those changes.Since
startGatewayis now exported, consider importing and delegating to it for the actual gateway start operation, or at minimum extract the shared gateway-start logic into a reusable helper.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/nemoclaw.js` around lines 115 - 137, The recoverNamedGatewayRuntime function duplicates gateway startup logic; import and call the exported startGateway from bin/lib/onboard.js (or extract common logic into a shared helper) instead of directly running runOpenshell(["gateway","start",...]) so you inherit health-check retries, version pinning and CoreDNS patches; update recoverNamedGatewayRuntime to call startGateway when needed, then re-check getNamedGatewayLifecycleState() and set process.env.OPENSHELL_GATEWAY = "nemoclaw" on success, preserving the existing return shape (recovered/before/after/attempted/via).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@bin/nemoclaw.js`:
- Around line 115-137: The recoverNamedGatewayRuntime function duplicates
gateway startup logic; import and call the exported startGateway from
bin/lib/onboard.js (or extract common logic into a shared helper) instead of
directly running runOpenshell(["gateway","start",...]) so you inherit
health-check retries, version pinning and CoreDNS patches; update
recoverNamedGatewayRuntime to call startGateway when needed, then re-check
getNamedGatewayLifecycleState() and set process.env.OPENSHELL_GATEWAY =
"nemoclaw" on success, preserving the existing return shape
(recovered/before/after/attempted/via).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d1106af7-673d-44bd-9650-57b992f3829d
📒 Files selected for processing (5)
bin/lib/onboard.jsbin/nemoclaw.jstest/cli.test.jstest/e2e/test-double-onboard.shtest/onboard.test.js
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
bin/lib/onboard.js (1)
188-191: Consider clarifying the semantic reuse ofhasStaleGateway.The
isGatewayHealthyfunction correctly checks that the gateway is both connected and NemoClaw-owned. However, reusinghasStaleGatewayhere is semantically confusing—the name implies checking for staleness, but in this context it verifies ownership.Consider extracting a helper with clearer naming:
♻️ Optional refactor for clarity
+function isNemoClawGateway(gwInfoOutput) { + return typeof gwInfoOutput === "string" && gwInfoOutput.length > 0 && gwInfoOutput.includes(GATEWAY_NAME); +} + function hasStaleGateway(gwInfoOutput) { - return typeof gwInfoOutput === "string" && gwInfoOutput.length > 0 && gwInfoOutput.includes(GATEWAY_NAME); + return isNemoClawGateway(gwInfoOutput); } function isGatewayHealthy(statusOutput = "", gwInfoOutput = "") { const connected = typeof statusOutput === "string" && statusOutput.includes("Connected"); - return connected && hasStaleGateway(gwInfoOutput); + return connected && isNemoClawGateway(gwInfoOutput); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 188 - 191, isGatewayHealthy currently uses hasStaleGateway which is misleading because that name implies checking staleness rather than ownership; introduce a new helper (e.g., isGatewayOwned or isNemoClawOwned) that inspects gwInfoOutput for NemoClaw/ownership and use that in isGatewayHealthy (connected && isGatewayOwned(gwInfoOutput)); keep hasStaleGateway for actual staleness checks elsewhere so behavior doesn't change, and ensure the new helper accepts the same gwInfoOutput parameter as hasStaleGateway.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/nemoclaw.js`:
- Around line 623-627: The sandboxLogs function currently pushes the wrong flag
for streaming logs: replace the call that pushes "--tail" (inside sandboxLogs,
where args.push("--tail") is used) with "--follow" so the follow parameter
enables streaming; ensure the code uses args.push("--follow") when follow is
truthy so it matches the CLI help and the caller's expected behavior.
---
Nitpick comments:
In `@bin/lib/onboard.js`:
- Around line 188-191: isGatewayHealthy currently uses hasStaleGateway which is
misleading because that name implies checking staleness rather than ownership;
introduce a new helper (e.g., isGatewayOwned or isNemoClawOwned) that inspects
gwInfoOutput for NemoClaw/ownership and use that in isGatewayHealthy (connected
&& isGatewayOwned(gwInfoOutput)); keep hasStaleGateway for actual staleness
checks elsewhere so behavior doesn't change, and ensure the new helper accepts
the same gwInfoOutput parameter as hasStaleGateway.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e77da17a-ee2f-424b-98d8-c4a182f079ce
📒 Files selected for processing (2)
bin/lib/onboard.jsbin/nemoclaw.js
There was a problem hiding this comment.
♻️ Duplicate comments (1)
bin/nemoclaw.js (1)
635-639:⚠️ Potential issue | 🟡 MinorUse
--followinstead of--tailto enable log streaming.The function parameter is
followbut the code passes--tail. The--tailflag is typically for showing the last N lines, whereas--followenables streaming. This matches the help text on line 717 which documentslogs [--follow].🐛 Proposed fix
function sandboxLogs(sandboxName, follow) { const args = ["logs", sandboxName]; - if (follow) args.push("--tail"); + if (follow) args.push("--follow"); runOpenshell(args); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/nemoclaw.js` around lines 635 - 639, The sandboxLogs function currently pushes the wrong flag ("--tail") when the parameter is named follow; change the flag to "--follow" so log streaming works as intended: update sandboxLogs (which builds args and calls runOpenshell) to push "--follow" when follow is truthy and remove/replace "--tail".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@bin/nemoclaw.js`:
- Around line 635-639: The sandboxLogs function currently pushes the wrong flag
("--tail") when the parameter is named follow; change the flag to "--follow" so
log streaming works as intended: update sandboxLogs (which builds args and calls
runOpenshell) to push "--follow" when follow is truthy and remove/replace
"--tail".
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 54410379-bfaf-4a9c-872c-11c701963a9b
📒 Files selected for processing (3)
bin/lib/onboard.jsbin/nemoclaw.jstest/cli.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
- test/cli.test.js
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
bin/nemoclaw.js (2)
97-97: UnusedignoreErroroption incaptureOpenshellcalls.
captureOpenshell()(lines 73-84) doesn't use theignoreErroroption—it always captures output without throwing. The option here is a no-op and may mislead future maintainers into thinking it affects behavior.Consider removing
{ ignoreError: true }fromcaptureOpenshellcalls, or document thatcaptureOpenshellintentionally ignores errors by design.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/nemoclaw.js` at line 97, The call to captureOpenshell in the variable gatewayInfo passes a no-op option { ignoreError: true } which is not consumed by the function; remove the unused option to avoid misleading callers or update captureOpenshell to honor ignoreError. Locate the captureOpenshell references (e.g., the call that assigns gatewayInfo) and either (A) remove the second parameter from that call (and other similar calls) so it becomes captureOpenshell(["gateway", "info", "-g", "nemoclaw"]) or (B) implement handling of an ignoreError flag inside the captureOpenshell function so it conditionally suppresses exceptions—pick one approach and make all calls and the function implementation consistent. Ensure gatewayInfo and other callers compile and behavior remains unchanged after the change.
568-570: Consider usingcaptureOpenshellfor consistency.This is the only place still using the legacy
_runCapturewith a shell command string. Consider refactoring to usecaptureOpenshell(["inference", "get"])for consistency with the new array-based approach and to avoid shell parsing.♻️ Proposed refactor
const live = parseGatewayInference( - _runCapture("openshell inference get 2>/dev/null", { ignoreError: true }) + captureOpenshell(["inference", "get"], { ignoreError: true }).output );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/nemoclaw.js` around lines 568 - 570, Replace the legacy _runCapture call with the new array-based captureOpenshell to avoid shell parsing: change the argument passed to parseGatewayInference from _runCapture("openshell inference get 2>/dev/null", { ignoreError: true }) to captureOpenshell(["inference", "get"], { ignoreError: true }); keep parseGatewayInference(...) unchanged, and ensure captureOpenshell behavior preserves the previous ignoreError semantics (and suppresses or ignores stderr similarly to the removed "2>/dev/null" redirection).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/nemoclaw.js`:
- Line 577: The console.log line that prints policies uses 6 spaces of
indentation and should be aligned with surrounding 4-space indenting; locate the
line containing console.log(` Policies: ${(sb.policies || []).join(", ") ||
"none"}`) and change its indentation to 4 spaces so it matches the surrounding
block (keep the string content unchanged, only adjust leading whitespace).
- Around line 181-186: The conditional that detects the handshake error uses the
raw variable output instead of the ANSI-stripped cleanOutput, causing the regex
to miss matches when ANSI codes are present; update the check to test
cleanOutput (i.e., change the regex condition to use cleanOutput) so it matches
consistently with the other checks (see the writer calls and the surrounding
handshake/error handling logic) and ensure cleanOutput is defined before this
conditional.
---
Nitpick comments:
In `@bin/nemoclaw.js`:
- Line 97: The call to captureOpenshell in the variable gatewayInfo passes a
no-op option { ignoreError: true } which is not consumed by the function; remove
the unused option to avoid misleading callers or update captureOpenshell to
honor ignoreError. Locate the captureOpenshell references (e.g., the call that
assigns gatewayInfo) and either (A) remove the second parameter from that call
(and other similar calls) so it becomes captureOpenshell(["gateway", "info",
"-g", "nemoclaw"]) or (B) implement handling of an ignoreError flag inside the
captureOpenshell function so it conditionally suppresses exceptions—pick one
approach and make all calls and the function implementation consistent. Ensure
gatewayInfo and other callers compile and behavior remains unchanged after the
change.
- Around line 568-570: Replace the legacy _runCapture call with the new
array-based captureOpenshell to avoid shell parsing: change the argument passed
to parseGatewayInference from _runCapture("openshell inference get 2>/dev/null",
{ ignoreError: true }) to captureOpenshell(["inference", "get"], { ignoreError:
true }); keep parseGatewayInference(...) unchanged, and ensure captureOpenshell
behavior preserves the previous ignoreError semantics (and suppresses or ignores
stderr similarly to the removed "2>/dev/null" redirection).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6207b5fb-0687-40fa-8c43-e1bcd7462915
📒 Files selected for processing (2)
bin/nemoclaw.jstest/cli.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
- test/cli.test.js
…into fix/gateway-lifecycle-recovery
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/nemoclaw.js`:
- Around line 187-195: Replace uses of the raw output variable with the
ANSI-stripped cleanOutput in the error-matching branches so the regex checks are
robust to ANSI escape codes: change the two conditions testing /Connection
refused|transport error/i.test(output) and /Missing gateway auth token|device
identity required/i.test(output) to use cleanOutput instead, keeping the
existing writer(...) calls and sandboxName reference unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5b98502a-f50f-48d7-9597-52d40f5794a6
📒 Files selected for processing (1)
bin/nemoclaw.js
…into fix/gateway-lifecycle-recovery
| }; | ||
| } | ||
|
|
||
| function stripAnsi(value = "") { |
There was a problem hiding this comment.
why a different stripAnsi than the one in onboard?
# Conflicts: # bin/lib/onboard.js # test/e2e-gateway-isolation.sh
…into fix/gateway-lifecycle-recovery
# Conflicts: # test/e2e-gateway-isolation.sh
|
Closing this in favor of #952. This branch had to be rewritten onto signed commits to satisfy verified-signature requirements, so the signed replacement PR now carries the same work forward. |
Summary
nemoclawgateway across repeat onboardingconnectandstatusinstead of trusting stale local registry entriesIssues
Security
openshellwrappersValidation
Brev CPU Validation
Environment:
brev-cpukj-nemoclaw-cpu-20260325-15544743cf8ebValidated on a real disposable Linux host:
openshell gateway stop+openshell gateway start --name nemoclaw, NemoClaw now surfaces a precise post-restart classification instead of a generic transport failureReadyResidual
Connection refusedstate. That remains a gateway/runtime limitation, not something this PR tries to bypass.Summary by CodeRabbit
New Features
Bug Fixes
Tests