Skip to content

fix(cli): clear sandbox registry when gateway is destroyed during onboard#1245

Merged
cv merged 3 commits intoNVIDIA:mainfrom
BenediktSchackenberg:fix/clear-registry-on-gateway-destroy
Apr 4, 2026
Merged

fix(cli): clear sandbox registry when gateway is destroyed during onboard#1245
cv merged 3 commits intoNVIDIA:mainfrom
BenediktSchackenberg:fix/clear-registry-on-gateway-destroy

Conversation

@BenediktSchackenberg
Copy link
Copy Markdown
Contributor

@BenediktSchackenberg BenediktSchackenberg commented Apr 1, 2026

Problem

When gateway destroy is called during the onboard flow, the sandbox registry was not being cleaned up. This left stale entries in memory that could cause unexpected behavior on subsequent onboard runs.

Solution

  • Added registry.clearAll() function that removes all registered sandbox entries
  • Called registry.clearAll() after gateway destroy in the onboard flow
  • Added JSDoc comments to existing registry functions for better discoverability
  • Added 3 tests covering clearAll() behavior

Related

Refs #634 — picks up the work from the stalled PR by @craigamcw

Summary by CodeRabbit

  • Bug Fixes

    • Improved local state cleanup during gateway teardown so stale sandbox entries are reliably cleared when teardown succeeds.
  • New Features

    • Added the ability to fully reset the sandbox registry to an empty, consistent state.
  • Tests

    • Added tests verifying registry reset behavior, disk persistence, and idempotent/safe operation.

Copilot AI review requested due to automatic review settings April 1, 2026 10:24
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a42da28f-ee1f-45bd-959c-f8d0e2d820d2

📥 Commits

Reviewing files that changed from the base of the PR and between 6196e50 and 13cafa3.

📒 Files selected for processing (1)
  • bin/lib/onboard.js
✅ Files skipped from review due to trivial changes (1)
  • bin/lib/onboard.js

📝 Walkthrough

Walkthrough

Adds registry.clearAll() to persistently reset sandbox state; bin/lib/onboard.js now captures gateway destroy exit status and calls registry.clearAll() when status === 0; JSDoc added to registry APIs; tests added to verify clearAll() behavior and on-disk persistence.

Changes

Cohort / File(s) Summary
Onboard
bin/lib/onboard.js
Captures gateway destroy result into destroyResult in destroyGateway() and preflight() flows; calls registry.clearAll() when destroyResult.status === 0 to keep local registry consistent.
Registry
bin/lib/registry.js
Adds exported clearAll() that acquires the registry lock and persists an empty registry ({ sandboxes: {}, defaultSandbox: null }); adds JSDoc to public functions and includes clearAll in module.exports.
Tests
test/registry.test.js
Adds three tests validating clearAll() removes sandboxes and default, writes cleared state to disk, and is idempotent when registry already empty.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Onboard as Onboard Script
participant OpenShell as openshell (gateway destroy)
participant Registry as Registry (clearAll)
participant FS as Filesystem
Onboard->>OpenShell: runOpenshell(["gateway","destroy",...], {ignoreError:true})
OpenShell-->>Onboard: destroyResult (status)
alt destroyResult.status == 0
Onboard->>Registry: registry.clearAll()
Registry->>FS: acquire lock & write { sandboxes: {}, defaultSandbox: null }
FS-->>Registry: write ack
Registry-->>Onboard: clearAll result
else destroy failed (non‑zero)
Onboard-->>Onboard: skip clearing registry
end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nudged the gateway, watched it go,

then swept the registry, neat and low.
Empty burrows, records bright,
a tidy hop into the night.
thump — all clean, and out of sight.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: clearing the sandbox registry when the gateway is destroyed during onboard, which directly matches the primary objective of fixing stale registry entries.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes stale local NemoClaw sandbox registry entries by clearing ~/.nemoclaw/sandboxes.json whenever onboarding destroys a stale/previous OpenShell gateway, keeping nemoclaw list and nemoclaw <name> connect consistent with OpenShell state (refs #634 / #532).

Changes:

  • Added registry.clearAll() to reset the sandbox registry to an empty state.
  • Invoked registry.clearAll() in onboarding flows that destroy a stale/previous gateway.
  • Added unit tests covering clearAll() behavior, persistence, and idempotency.

Reviewed changes

Copilot reviewed 1 out of 3 changed files in this pull request and generated no comments.

File Description
bin/lib/registry.js Adds clearAll() API to atomically reset registry contents.
bin/lib/onboard.js Clears local registry after gateway destroy during preflight and stale-gateway handling.
test/registry.test.js Adds tests validating clearAll() semantics and persistence.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/onboard.js`:
- Around line 2006-2009: The registry.clearAll() call is unconditional after
attempting to destroy the gateway; change the code to only clear the registry
when the destroy actually succeeded by checking the result of
runOpenshell("gateway","destroy",...) or by handling its thrown error rather
than unconditionally passing ignoreError: true. Concretely, capture the return
value or await runOpenshell(...) into a variable (call it e.g. destroyResult)
when invoking runOpenshell with GATEWAY_NAME, verify success (or absence of
error) and only call registry.clearAll() when that check passes; apply the same
guarded logic to the second occurrence (the block around the second runOpenshell
call at the later location).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f90f8f8f-2b00-4441-9bcb-5fcc48caaefa

📥 Commits

Reviewing files that changed from the base of the PR and between e06c147 and 3d078e6.

📒 Files selected for processing (3)
  • bin/lib/onboard.js
  • bin/lib/registry.js
  • test/registry.test.js

Comment thread bin/lib/onboard.js Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
bin/lib/onboard.js (1)

2013-2015: Add registry.clearAll() to cleanupGatewayAfterLastSandbox() for consistency with issue #532.

The function cleanupGatewayAfterLastSandbox() in bin/nemoclaw.js:116 destroys the gateway but does not call registry.clearAll(), leaving the registry stale—the exact problem this PR addresses in onboard.js:2012-2015. Apply the same pattern there for consistency:

runOpenshell(["gateway", "destroy", "-g", NEMOCLAW_GATEWAY_NAME], { ignoreError: true });
registry.clearAll();

This ensures the local registry stays consistent with the destroyed gateway state across the codebase.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 2013 - 2015, The function
cleanupGatewayAfterLastSandbox() currently destroys the gateway but doesn't
clear the local registry; update it to call registry.clearAll() immediately
after the gateway destroy call (i.e., after runOpenshell(["gateway", "destroy",
"-g", NEMOCLAW_GATEWAY_NAME], { ignoreError: true })) so the local registry
stays consistent with the destroyed gateway state, mirroring the pattern used in
onboard.js where registry.clearAll() follows the gateway destroy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@bin/lib/onboard.js`:
- Around line 2013-2015: The function cleanupGatewayAfterLastSandbox() currently
destroys the gateway but doesn't clear the local registry; update it to call
registry.clearAll() immediately after the gateway destroy call (i.e., after
runOpenshell(["gateway", "destroy", "-g", NEMOCLAW_GATEWAY_NAME], { ignoreError:
true })) so the local registry stays consistent with the destroyed gateway
state, mirroring the pattern used in onboard.js where registry.clearAll()
follows the gateway destroy.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8f418602-cd14-49b9-9c73-291556472077

📥 Commits

Reviewing files that changed from the base of the PR and between 3d078e6 and 204e11c.

📒 Files selected for processing (1)
  • bin/lib/onboard.js

@BenediktSchackenberg
Copy link
Copy Markdown
Contributor Author

Good catch — guarded both clearAll() calls on destroyResult.status === 0. Fixed in latest commit.

…oard

Add registry.clearAll() to reset sandbox state after gateway destruction,
keeping `nemoclaw list` consistent with the actual OpenShell state.

- Add clearAll() function to registry module (guarded on successful destroy)
- Call clearAll() in destroyGateway() and preflight cleanup
- Add JSDoc comments to all registry functions
- Add 3 test cases for clearAll behavior

Fixes NVIDIA#532

Signed-off-by: Benedikt Schackenberg <6381261+BenediktSchackenberg@users.noreply.github.com>
Per CodeRabbit: clearAll() was called unconditionally even when
gateway destroy failed. Now captures the result and only clears
when status === 0. Applied via destroyGateway() helper.

Signed-off-by: Benedikt Schackenberg <6381261+BenediktSchackenberg@users.noreply.github.com>
@BenediktSchackenberg BenediktSchackenberg force-pushed the fix/clear-registry-on-gateway-destroy branch from fa9df10 to 6196e50 Compare April 3, 2026 07:42
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
bin/lib/registry.js (1)

124-133: ⚠️ Potential issue | 🟠 Major

Normalize parsed registry shape in load() before returning.

load() currently returns any parseable JSON structure, which can crash downstream accessors (getSandbox, listSandboxes) when sandboxes is missing/non-object. Please coerce to a safe schema at the load boundary.

Suggested fix
 function load() {
   try {
     if (fs.existsSync(REGISTRY_FILE)) {
-      return JSON.parse(fs.readFileSync(REGISTRY_FILE, "utf-8"));
+      const parsed = JSON.parse(fs.readFileSync(REGISTRY_FILE, "utf-8"));
+      if (parsed && typeof parsed === "object" && !Array.isArray(parsed)) {
+        const sandboxes =
+          parsed.sandboxes && typeof parsed.sandboxes === "object" && !Array.isArray(parsed.sandboxes)
+            ? parsed.sandboxes
+            : {};
+        const defaultSandbox =
+          typeof parsed.defaultSandbox === "string" || parsed.defaultSandbox === null
+            ? parsed.defaultSandbox
+            : null;
+        return { sandboxes, defaultSandbox };
+      }
     }
   } catch {
     /* ignored */
   }
   return { sandboxes: {}, defaultSandbox: null };
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/registry.js` around lines 124 - 133, The load() function currently
returns whatever JSON is parsed, which can break consumers like getSandbox and
listSandboxes; update load() so after parsing the file it normalizes the shape:
ensure the returned value is always an object with a sandboxes property that is
an object (fallback to {}) and a defaultSandbox property (fallback to null),
e.g. parse the JSON into a temp variable, coerce temp.sandboxes = typeof
temp.sandboxes === 'object' && temp.sandboxes ? temp.sandboxes : {} and
temp.defaultSandbox = temp.hasOwnProperty('defaultSandbox') ?
temp.defaultSandbox : null, then return the normalized object from load() so
downstream functions (getSandbox, listSandboxes) can safely assume the schema.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@bin/lib/registry.js`:
- Around line 124-133: The load() function currently returns whatever JSON is
parsed, which can break consumers like getSandbox and listSandboxes; update
load() so after parsing the file it normalizes the shape: ensure the returned
value is always an object with a sandboxes property that is an object (fallback
to {}) and a defaultSandbox property (fallback to null), e.g. parse the JSON
into a temp variable, coerce temp.sandboxes = typeof temp.sandboxes === 'object'
&& temp.sandboxes ? temp.sandboxes : {} and temp.defaultSandbox =
temp.hasOwnProperty('defaultSandbox') ? temp.defaultSandbox : null, then return
the normalized object from load() so downstream functions (getSandbox,
listSandboxes) can safely assume the schema.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e0986552-a270-4d04-aff8-bd5c1673f4ea

📥 Commits

Reviewing files that changed from the base of the PR and between e410c0d and 6196e50.

📒 Files selected for processing (3)
  • bin/lib/onboard.js
  • bin/lib/registry.js
  • test/registry.test.js
✅ Files skipped from review due to trivial changes (1)
  • bin/lib/onboard.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/registry.test.js

@cv cv merged commit c3ee651 into NVIDIA:main Apr 4, 2026
6 of 8 checks passed
@BenediktSchackenberg BenediktSchackenberg deleted the fix/clear-registry-on-gateway-destroy branch April 6, 2026 18:38
gemini2026 pushed a commit to gemini2026/NemoClaw that referenced this pull request Apr 14, 2026
…oard (NVIDIA#1245)

## Problem
When `gateway destroy` is called during the onboard flow, the sandbox
registry was not being cleaned up. This left stale entries in memory
that could cause unexpected behavior on subsequent onboard runs.

## Solution
- Added `registry.clearAll()` function that removes all registered
sandbox entries
- Called `registry.clearAll()` after `gateway destroy` in the onboard
flow
- Added JSDoc comments to existing registry functions for better
discoverability
- Added 3 tests covering `clearAll()` behavior

## Related
Refs NVIDIA#634 — picks up the work from the stalled PR by @craigamcw

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Improved local state cleanup during gateway teardown so stale sandbox
entries are reliably cleared when teardown succeeds.

* **New Features**
* Added the ability to fully reset the sandbox registry to an empty,
consistent state.

* **Tests**
* Added tests verifying registry reset behavior, disk persistence, and
idempotent/safe operation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Benedikt Schackenberg <6381261+BenediktSchackenberg@users.noreply.github.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants