fix(cli): clear sandbox registry when gateway is destroyed during onboard#1245
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughAdds Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR fixes stale local NemoClaw sandbox registry entries by clearing ~/.nemoclaw/sandboxes.json whenever onboarding destroys a stale/previous OpenShell gateway, keeping nemoclaw list and nemoclaw <name> connect consistent with OpenShell state (refs #634 / #532).
Changes:
- Added
registry.clearAll()to reset the sandbox registry to an empty state. - Invoked
registry.clearAll()in onboarding flows that destroy a stale/previous gateway. - Added unit tests covering
clearAll()behavior, persistence, and idempotency.
Reviewed changes
Copilot reviewed 1 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
bin/lib/registry.js |
Adds clearAll() API to atomically reset registry contents. |
bin/lib/onboard.js |
Clears local registry after gateway destroy during preflight and stale-gateway handling. |
test/registry.test.js |
Adds tests validating clearAll() semantics and persistence. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/onboard.js`:
- Around line 2006-2009: The registry.clearAll() call is unconditional after
attempting to destroy the gateway; change the code to only clear the registry
when the destroy actually succeeded by checking the result of
runOpenshell("gateway","destroy",...) or by handling its thrown error rather
than unconditionally passing ignoreError: true. Concretely, capture the return
value or await runOpenshell(...) into a variable (call it e.g. destroyResult)
when invoking runOpenshell with GATEWAY_NAME, verify success (or absence of
error) and only call registry.clearAll() when that check passes; apply the same
guarded logic to the second occurrence (the block around the second runOpenshell
call at the later location).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f90f8f8f-2b00-4441-9bcb-5fcc48caaefa
📒 Files selected for processing (3)
bin/lib/onboard.jsbin/lib/registry.jstest/registry.test.js
There was a problem hiding this comment.
🧹 Nitpick comments (1)
bin/lib/onboard.js (1)
2013-2015: Addregistry.clearAll()tocleanupGatewayAfterLastSandbox()for consistency with issue#532.The function
cleanupGatewayAfterLastSandbox()inbin/nemoclaw.js:116destroys the gateway but does not callregistry.clearAll(), leaving the registry stale—the exact problem this PR addresses inonboard.js:2012-2015. Apply the same pattern there for consistency:runOpenshell(["gateway", "destroy", "-g", NEMOCLAW_GATEWAY_NAME], { ignoreError: true }); registry.clearAll();This ensures the local registry stays consistent with the destroyed gateway state across the codebase.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 2013 - 2015, The function cleanupGatewayAfterLastSandbox() currently destroys the gateway but doesn't clear the local registry; update it to call registry.clearAll() immediately after the gateway destroy call (i.e., after runOpenshell(["gateway", "destroy", "-g", NEMOCLAW_GATEWAY_NAME], { ignoreError: true })) so the local registry stays consistent with the destroyed gateway state, mirroring the pattern used in onboard.js where registry.clearAll() follows the gateway destroy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@bin/lib/onboard.js`:
- Around line 2013-2015: The function cleanupGatewayAfterLastSandbox() currently
destroys the gateway but doesn't clear the local registry; update it to call
registry.clearAll() immediately after the gateway destroy call (i.e., after
runOpenshell(["gateway", "destroy", "-g", NEMOCLAW_GATEWAY_NAME], { ignoreError:
true })) so the local registry stays consistent with the destroyed gateway
state, mirroring the pattern used in onboard.js where registry.clearAll()
follows the gateway destroy.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8f418602-cd14-49b9-9c73-291556472077
📒 Files selected for processing (1)
bin/lib/onboard.js
|
Good catch — guarded both |
…oard Add registry.clearAll() to reset sandbox state after gateway destruction, keeping `nemoclaw list` consistent with the actual OpenShell state. - Add clearAll() function to registry module (guarded on successful destroy) - Call clearAll() in destroyGateway() and preflight cleanup - Add JSDoc comments to all registry functions - Add 3 test cases for clearAll behavior Fixes NVIDIA#532 Signed-off-by: Benedikt Schackenberg <6381261+BenediktSchackenberg@users.noreply.github.com>
Per CodeRabbit: clearAll() was called unconditionally even when gateway destroy failed. Now captures the result and only clears when status === 0. Applied via destroyGateway() helper. Signed-off-by: Benedikt Schackenberg <6381261+BenediktSchackenberg@users.noreply.github.com>
fa9df10 to
6196e50
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
bin/lib/registry.js (1)
124-133:⚠️ Potential issue | 🟠 MajorNormalize parsed registry shape in
load()before returning.
load()currently returns any parseable JSON structure, which can crash downstream accessors (getSandbox,listSandboxes) whensandboxesis missing/non-object. Please coerce to a safe schema at the load boundary.Suggested fix
function load() { try { if (fs.existsSync(REGISTRY_FILE)) { - return JSON.parse(fs.readFileSync(REGISTRY_FILE, "utf-8")); + const parsed = JSON.parse(fs.readFileSync(REGISTRY_FILE, "utf-8")); + if (parsed && typeof parsed === "object" && !Array.isArray(parsed)) { + const sandboxes = + parsed.sandboxes && typeof parsed.sandboxes === "object" && !Array.isArray(parsed.sandboxes) + ? parsed.sandboxes + : {}; + const defaultSandbox = + typeof parsed.defaultSandbox === "string" || parsed.defaultSandbox === null + ? parsed.defaultSandbox + : null; + return { sandboxes, defaultSandbox }; + } } } catch { /* ignored */ } return { sandboxes: {}, defaultSandbox: null }; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/registry.js` around lines 124 - 133, The load() function currently returns whatever JSON is parsed, which can break consumers like getSandbox and listSandboxes; update load() so after parsing the file it normalizes the shape: ensure the returned value is always an object with a sandboxes property that is an object (fallback to {}) and a defaultSandbox property (fallback to null), e.g. parse the JSON into a temp variable, coerce temp.sandboxes = typeof temp.sandboxes === 'object' && temp.sandboxes ? temp.sandboxes : {} and temp.defaultSandbox = temp.hasOwnProperty('defaultSandbox') ? temp.defaultSandbox : null, then return the normalized object from load() so downstream functions (getSandbox, listSandboxes) can safely assume the schema.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@bin/lib/registry.js`:
- Around line 124-133: The load() function currently returns whatever JSON is
parsed, which can break consumers like getSandbox and listSandboxes; update
load() so after parsing the file it normalizes the shape: ensure the returned
value is always an object with a sandboxes property that is an object (fallback
to {}) and a defaultSandbox property (fallback to null), e.g. parse the JSON
into a temp variable, coerce temp.sandboxes = typeof temp.sandboxes === 'object'
&& temp.sandboxes ? temp.sandboxes : {} and temp.defaultSandbox =
temp.hasOwnProperty('defaultSandbox') ? temp.defaultSandbox : null, then return
the normalized object from load() so downstream functions (getSandbox,
listSandboxes) can safely assume the schema.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e0986552-a270-4d04-aff8-bd5c1673f4ea
📒 Files selected for processing (3)
bin/lib/onboard.jsbin/lib/registry.jstest/registry.test.js
✅ Files skipped from review due to trivial changes (1)
- bin/lib/onboard.js
🚧 Files skipped from review as they are similar to previous changes (1)
- test/registry.test.js
…oard (NVIDIA#1245) ## Problem When `gateway destroy` is called during the onboard flow, the sandbox registry was not being cleaned up. This left stale entries in memory that could cause unexpected behavior on subsequent onboard runs. ## Solution - Added `registry.clearAll()` function that removes all registered sandbox entries - Called `registry.clearAll()` after `gateway destroy` in the onboard flow - Added JSDoc comments to existing registry functions for better discoverability - Added 3 tests covering `clearAll()` behavior ## Related Refs NVIDIA#634 — picks up the work from the stalled PR by @craigamcw <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved local state cleanup during gateway teardown so stale sandbox entries are reliably cleared when teardown succeeds. * **New Features** * Added the ability to fully reset the sandbox registry to an empty, consistent state. * **Tests** * Added tests verifying registry reset behavior, disk persistence, and idempotent/safe operation. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Benedikt Schackenberg <6381261+BenediktSchackenberg@users.noreply.github.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Problem
When
gateway destroyis called during the onboard flow, the sandbox registry was not being cleaned up. This left stale entries in memory that could cause unexpected behavior on subsequent onboard runs.Solution
registry.clearAll()function that removes all registered sandbox entriesregistry.clearAll()aftergateway destroyin the onboard flowclearAll()behaviorRelated
Refs #634 — picks up the work from the stalled PR by @craigamcw
Summary by CodeRabbit
Bug Fixes
New Features
Tests