Skip to content

fix: gate sandbox registration on creation readiness and guard WSL name truncation#229

Merged
kjw3 merged 11 commits intoNVIDIA:mainfrom
brianwtaylor:fix/sandbox-creation-readiness-gate
Mar 19, 2026
Merged

fix: gate sandbox registration on creation readiness and guard WSL name truncation#229
kjw3 merged 11 commits intoNVIDIA:mainfrom
brianwtaylor:fix/sandbox-creation-readiness-gate

Conversation

@brianwtaylor
Copy link
Copy Markdown
Contributor

@brianwtaylor brianwtaylor commented Mar 17, 2026

Summary

  • Remove the `awk` pipe from the sandbox create command that silently swallowed non-zero exit codes
  • Add a readiness polling loop that waits for the sandbox to reach `Ready` state in k3s before registering locally
  • Clean up orphaned sandboxes on readiness timeout so retries with the same name work cleanly
  • Move build-context cleanup before the exit-code check so temp files are always cleaned up
  • Guard against truncated sandbox names on WSL with RFC 1123 validation (including 63-char limit) in `applyPreset()`
  • Extract `isSandboxReady()` as a shared function so tests validate the production parser
  • Clean up stale NemoClaw gateway and port forward before the preflight port check so re-running `nemoclaw onboard` works without manual intervention

Fixes #21
Fixes #22
Fixes #140
Fixes #152
Closes #397

Root cause

`createSandbox()` piped `openshell sandbox create` through `awk` to deduplicate log lines. In bash, the exit status of a pipeline is the status of the last command — `awk`, which always exits 0. If sandbox creation fails, `run()` sees exit 0 and continues to register a phantom sandbox.

WSL sandbox name truncation (#21)

On WSL, hyphenated sandbox names like `my-assistant` can be truncated during shell argument parsing (e.g. to `m`), causing `openshell policy set --wait` to receive a garbage name. Added RFC 1123 validation in `applyPreset()` as defense-in-depth and quoted sandbox names in error output.

Stale port conflicts on re-onboard (#397)

A previous `nemoclaw onboard` session leaves the OpenShell gateway container (port 8080) and port forward (18789) running after exit. The next `onboard` invocation fails at the preflight port check before reaching the existing cleanup code. Moved detection and teardown of a NemoClaw-owned gateway into `preflight()`, before the port availability check.

Test plan

Automated Tests

```
node --test test/onboard-readiness.test.js
```

23 tests across 3 suites:

  • 11 sandbox readiness parsing (ANSI stripping, exact match, multi-sandbox, tab-separated)
  • 6 WSL regression (hyphenated names, truncation detection, command quoting, RFC 1123 validation)
  • 6 stale gateway detection (real output, ANSI output, null/empty/error, non-NemoClaw gateway)

Hardware Validation

Environment Full Suite PR Tests
DGX Spark (aarch64, Ubuntu 24.04, Node 22) 162/167 pass (5 pre-existing) 23/23 pass
Windows WSL2 (x86_64, Ubuntu 22.04, Docker Desktop, Node 22) 165/167 pass (2 pre-existing) 23/23 pass

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces piped/awk sandbox-create detection with direct openshell exit-status checks, adds ANSI-stripped readiness polling (≈60s), removes orphaned sandboxes on timeout, moves build-context cleanup earlier, defers registry registration/policy apply until sandbox is Ready, and adds validation and unit tests for name/readiness handling.

Changes

Cohort / File(s) Summary
Onboarding flow
bin/lib/onboard.js
Use direct openshell sandbox create with captured exit status; remove ignoreError masking; move build-context cleanup earlier; poll openshell sandbox list (strip ANSI) for exact-name Ready up to ~60s; delete orphaned sandbox on timeout; register sandbox and apply policies only after readiness; surface failures with guidance.
Readiness & regression tests
test/onboard-readiness.test.js
Add unit tests for isSandboxReady parsing (ANSI stripping, exact-first-column match, tabs, multiple entries) and regression tests protecting against WSL/truncated-name issues and policy-command quoting.
Preset validation
bin/lib/policies.js
Validate sandboxName in applyPreset (non-empty, ≤63 chars, RFC1123-style lowercase alnum/hyphen) and throw descriptive error on invalid/truncated names before applying presets.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as NemoClaw CLI
    participant OS as OpenShell CLI
    participant REG as Local Registry
    participant FS as Build Context

    CLI->>OS: openshell sandbox create ... (capture exit status)
    alt exit 0
        CLI->>FS: remove build context
        CLI->>OS: poll "openshell sandbox list" (strip ANSI) up to 60s
        alt sandbox Ready (exact name, not "NotReady")
            CLI->>REG: register sandbox
            CLI->>OS: apply policies / port-forward
        else timeout / not Ready
            CLI->>OS: openshell sandbox delete <name> || true
            CLI-->>CLI: log diagnostics and exit non-zero
        end
    else non-zero exit
        CLI-->>CLI: surface error, log diagnostics, exit non-zero
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

"I hopped through ANSI trails and chased the stray,
Matched names exactly, chased the ghosts away.
I cleaned the build, then waited for the light—
When Ready gleamed, I set the policies right.
🐇 Ready, steady, bound!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately summarizes the main changes: gating sandbox registration on readiness and adding name validation guards for WSL truncation.
Linked Issues check ✅ Passed All objectives from #21, #22, #140, #152 are met: readiness polling [#140, #152], failure surfacing [#22], orphaned sandbox cleanup [#22], RFC1123 validation [#21], and regression test coverage.
Out of Scope Changes check ✅ Passed All changes directly address linked issue objectives: readiness gate, failure handling, cleanup, name validation, and test coverage. No unrelated or tangential modifications detected.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@brianwtaylor brianwtaylor force-pushed the fix/sandbox-creation-readiness-gate branch from 588e54b to 04b0d98 Compare March 18, 2026 20:36
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/onboard-readiness.test.js (1)

7-16: Test the production readiness logic, not a copied helper.

isSandboxReady() is reimplemented locally, so this suite can stay green even if bin/lib/onboard.js drifts or if registry.registerSandbox() is accidentally moved above the wait loop again. Extract the parser from onboarding and import it here; then add one mocked create/onboard test that asserts registration happens only after Ready and never on timeout.

Also applies to: 18-80

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/onboard-readiness.test.js` around lines 7 - 16, Replace the local
isSandboxReady() copy with the canonical parser exported from
bin/lib/onboard.js: export the readiness-parsing function (e.g.,
parseSandboxReadiness or isSandboxReady) from onboard.js, import that function
into test/onboard-readiness.test.js and remove the duplicated implementation;
then add a mocked integration test that uses the real create/onboard flow with a
fake registry (spy/stub registry.registerSandbox) to assert registerSandbox is
called only after the parser observes "Ready" in output and is never called when
the parser times out (simulate output with and without "Ready" and assert
registerSandbox call counts/ordering accordingly).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/onboard.js`:
- Around line 352-357: When the ready check fails for sandboxName (the branch
where ready is false and you call process.exit(1)), ensure we clean up the
orphaned remote sandbox so future runs that call registry.getSandbox() won't be
blocked: either invoke the OpenShell deletion flow for sandboxName before
exiting (the same mechanism used elsewhere to remove sandboxes) or at minimum
emit a concrete cleanup command in the error output that users can copy (e.g.,
"openshell sandbox delete <sandboxName>"); perform the delete call and only exit
after it completes or log the explicit delete command along with the existing
log lines so the orphan is removed or easily cleaned up.

---

Nitpick comments:
In `@test/onboard-readiness.test.js`:
- Around line 7-16: Replace the local isSandboxReady() copy with the canonical
parser exported from bin/lib/onboard.js: export the readiness-parsing function
(e.g., parseSandboxReadiness or isSandboxReady) from onboard.js, import that
function into test/onboard-readiness.test.js and remove the duplicated
implementation; then add a mocked integration test that uses the real
create/onboard flow with a fake registry (spy/stub registry.registerSandbox) to
assert registerSandbox is called only after the parser observes "Ready" in
output and is never called when the parser times out (simulate output with and
without "Ready" and assert registerSandbox call counts/ordering accordingly).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8473049c-d6e3-48d6-9737-799f92480a0d

📥 Commits

Reviewing files that changed from the base of the PR and between 2b5febe and 04b0d98.

📒 Files selected for processing (2)
  • bin/lib/onboard.js
  • test/onboard-readiness.test.js

Comment thread bin/lib/onboard.js
@brianwtaylor brianwtaylor changed the title fix: gate sandbox registration on creation readiness fix: gate sandbox registration on creation readiness and guard WSL name truncation Mar 18, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
bin/lib/onboard.js (1)

349-349: Minor: Consider using the existing sleep() helper.

Line 349 inlines require("child_process").spawnSync("sleep", ["2"]) while a sleep() helper already exists at line 75. Using sleep(2) would improve consistency.

♻️ Suggested change
-    require("child_process").spawnSync("sleep", ["2"]);
+    sleep(2);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` at line 349, Replace the inline blocking call
require("child_process").spawnSync("sleep", ["2"]) with the existing sleep
helper (sleep(2)) to maintain consistency; locate the inline call in onboard.js
(the spawnSync usage) and swap it to call the sleep function defined earlier
(the sleep() helper at line ~75) and remove the unnecessary require usage so the
module uses the shared helper instead of a new child_process invocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@bin/lib/onboard.js`:
- Line 349: Replace the inline blocking call
require("child_process").spawnSync("sleep", ["2"]) with the existing sleep
helper (sleep(2)) to maintain consistency; locate the inline call in onboard.js
(the spawnSync usage) and swap it to call the sleep function defined earlier
(the sleep() helper at line ~75) and remove the unnecessary require usage so the
module uses the shared helper instead of a new child_process invocation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e3933dd4-7f69-47f7-9405-9f33bb7ec3cb

📥 Commits

Reviewing files that changed from the base of the PR and between 04b0d98 and 0d01452.

📒 Files selected for processing (3)
  • bin/lib/onboard.js
  • bin/lib/policies.js
  • test/onboard-readiness.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/onboard-readiness.test.js

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/policies.js`:
- Around line 87-94: The guard for sandboxName currently checks shape with the
regex but not length, so add an explicit max-length check (RFC 1123 limit of 63
characters) before/alongside the regex validation: check sandboxName.length <=
63 and throw the same Error (or an updated message) if length exceeds 63;
reference the sandboxName variable and the existing regex validation block to
locate where to add this deterministic length check and update the error text to
mention the 63-character limit.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dafb3791-daad-4247-b9bf-146ca6b4def2

📥 Commits

Reviewing files that changed from the base of the PR and between 0d01452 and 3d4bbb0.

📒 Files selected for processing (2)
  • bin/lib/policies.js
  • test/onboard-readiness.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/onboard-readiness.test.js

Comment thread bin/lib/policies.js
@wscurran wscurran added the Platform: Windows/WSL Support for Windows Subsystem for Linux label Mar 18, 2026
…ures

The sandbox create command was piped through awk to deduplicate log
lines. In bash, the exit status of a pipeline is the status of the last
command (awk, always 0), so creation failures were silently swallowed.
NemoClaw then registered a phantom sandbox in ~/.nemoclaw/sandboxes.json
that caused "sandbox not found" on every subsequent connect/status call.

This is the root cause of the WSL2 + Docker Desktop failures reported
in #140 and #152 — sandbox creation fails due to Docker networking
issues, but onboarding completes as if it succeeded.

Three changes:
1. Remove the awk pipe so the real exit code flows through to run()
2. Poll openshell sandbox list for Ready state before registering
   (matches the gateway health check pattern at lines 121-132)
3. Move build-context cleanup before the exit-code check so temp
   files are always cleaned up, even on failure

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
includes("Ready") falsely matched "NotReady" because "Ready" is a
substring. Use a word-boundary regex with a NotReady exclusion so
sandboxes stuck in error states are not registered as healthy.

Also remove the off-by-one break at i=29 so the loop sleeps the
full 60s before timing out.

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
Use column-based matching (split on whitespace, check cols[0])
instead of substring includes(). Prevents false positives when
one sandbox name is a prefix of another (e.g. "my" matching
"my-assistant").

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
On WSL, hyphenated sandbox names like "my-assistant" can be truncated
to "m" during shell argument parsing, causing "sandbox not found"
failures when applying policy presets.

- Add RFC 1123 validation in applyPreset() to catch truncated names
  early with a clear error message
- Quote sandbox name in error output (was unquoted on line 356)
- Add 6 WSL-specific regression tests covering hyphenated names,
  multi-hyphen names, truncation detection, and command quoting
When the sandbox is created but never reaches Ready within 60s,
delete it before exiting so the next onboard retry with the same
name doesn't fail on "sandbox already exists".
@brianwtaylor brianwtaylor force-pushed the fix/sandbox-creation-readiness-gate branch from 682a94f to 6d617be Compare March 19, 2026 01:12
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
bin/lib/onboard.js (2)

380-387: ⚠️ Potential issue | 🟡 Minor

RFC 1123 validation is missing the label-length bound.

The current check allows overly long names; RFC 1123 labels should be 1–63 chars. Catch this early to avoid later openshell failure.

Proposed fix
-  if (!/^[a-z0-9]([a-z0-9-]*[a-z0-9])?$/.test(sandboxName)) {
+  const isValidSandboxName =
+    sandboxName.length >= 1 &&
+    sandboxName.length <= 63 &&
+    /^[a-z0-9]([a-z0-9-]*[a-z0-9])?$/.test(sandboxName);
+  if (!isValidSandboxName) {
     console.error(`  Invalid sandbox name: '${sandboxName}'`);
-    console.error("  Names must be lowercase, contain only letters, numbers, and hyphens,");
+    console.error("  Names must be 1-63 chars, lowercase, contain only letters, numbers, and hyphens,");
     console.error("  and must start and end with a letter or number.");
     process.exit(1);
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 380 - 387, The RFC1123 validation for
sandboxName currently only checks characters but not label length; update the
validation in the sandboxName check (the if that tests
/^[a-z0-9]([a-z0-9-]*[a-z0-9])?$/) to enforce labels are 1–63 characters long by
either augmenting the regex to include the {1,63} bound or by adding an explicit
length check (ensure sandboxName.length >=1 && sandboxName.length <=63) before
calling process.exit(1), and keep the same error messages for consistency.

433-443: ⚠️ Potential issue | 🟠 Major

Environment values are shell-interpolated without quoting.

Lines 433-436 and Line 443 currently pass raw env values into a shell command. If values contain spaces or shell metacharacters, command behavior can break or be injected.

Proposed fix
   const chatUiUrl = process.env.CHAT_UI_URL || 'http://127.0.0.1:18789';
   const envArgs = [`CHAT_UI_URL=${chatUiUrl}`];
   if (process.env.NVIDIA_API_KEY) {
     envArgs.push(`NVIDIA_API_KEY=${process.env.NVIDIA_API_KEY}`);
   }
+  const quotedEnvArgs = envArgs.map((entry) => shellQuote(entry));
@@
   const createResult = run(
-    `openshell sandbox create ${createArgs.join(" ")} -- env ${envArgs.join(" ")} nemoclaw-start 2>&1`,
+    `openshell sandbox create ${createArgs.join(" ")} -- env ${quotedEnvArgs.join(" ")} nemoclaw-start 2>&1`,
     { ignoreError: true }
   );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 433 - 443, The env values (chatUiUrl and
process.env.NVIDIA_API_KEY) are being interpolated directly into the shell
command via envArgs and createArgs passed to run(), which allows spaces or shell
metacharacters to break or inject commands; update the code to properly
shell-escape or quote each environment value before joining into envArgs (or,
better, switch to invoking run/openshell with an argument array instead of a
single interpolated command) so that envArgs contains safely-escaped entries
(handle existing quotes by escaping single quotes or using a proven sh-escaping
helper) and then call run() with the escaped/array form so createResult receives
a safe command string/argv list.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/onboard.js`:
- Around line 483-487: The timeout path currently always prints that the
orphaned sandbox was removed despite suppressing errors in the run call; modify
the handling around the run(`openshell sandbox delete "${sandboxName}" ...`, {
ignoreError: true }) invocation to capture its success/failure (e.g., check its
exit status or thrown error) and branch the messages accordingly: if delete
succeeded print the existing "has been removed" message, otherwise print that
automatic deletion failed and include a suggested manual cleanup command (e.g.,
`openshell sandbox delete "${sandboxName}"`) and where to find logs. Update
references to run and sandboxName in onboard.js accordingly so the output
accurately reflects the deletion result.

In `@test/onboard-readiness.test.js`:
- Around line 12-18: The test contains a duplicated parser (isSandboxReady) that
can drift from the production onboarding parser; remove this local copy and
instead import and use the shared readiness parser exported from the onboarding
module (the same parser used in bin/lib/onboard.js). Update the onboarding
module to export its readiness-parsing function (the function used to parse
readiness lines in onboard.js) if not already exported, then replace the test's
isSandboxReady implementation with an import of that exported helper and call it
in the test so production and test parsing stay in sync.

---

Outside diff comments:
In `@bin/lib/onboard.js`:
- Around line 380-387: The RFC1123 validation for sandboxName currently only
checks characters but not label length; update the validation in the sandboxName
check (the if that tests /^[a-z0-9]([a-z0-9-]*[a-z0-9])?$/) to enforce labels
are 1–63 characters long by either augmenting the regex to include the {1,63}
bound or by adding an explicit length check (ensure sandboxName.length >=1 &&
sandboxName.length <=63) before calling process.exit(1), and keep the same error
messages for consistency.
- Around line 433-443: The env values (chatUiUrl and process.env.NVIDIA_API_KEY)
are being interpolated directly into the shell command via envArgs and
createArgs passed to run(), which allows spaces or shell metacharacters to break
or inject commands; update the code to properly shell-escape or quote each
environment value before joining into envArgs (or, better, switch to invoking
run/openshell with an argument array instead of a single interpolated command)
so that envArgs contains safely-escaped entries (handle existing quotes by
escaping single quotes or using a proven sh-escaping helper) and then call run()
with the escaped/array form so createResult receives a safe command string/argv
list.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4ba383ff-b3ab-4241-87e3-8c68ec125569

📥 Commits

Reviewing files that changed from the base of the PR and between 682a94f and 6d617be.

📒 Files selected for processing (3)
  • bin/lib/onboard.js
  • bin/lib/policies.js
  • test/onboard-readiness.test.js
✅ Files skipped from review due to trivial changes (1)
  • bin/lib/policies.js

Comment thread bin/lib/onboard.js Outdated
Comment thread test/onboard-readiness.test.js Outdated
…saging

- Extract readiness parser from inline code to exported isSandboxReady()
  so tests validate the production code, not a duplicated copy
- Branch cleanup messaging on delete result — report manual cleanup
  command if the orphan delete fails
A previous onboard session may leave the OpenShell gateway container
and port forward running, causing port 8080/18789 conflicts on the
next invocation. Detect a NemoClaw-owned gateway before the port
availability check and tear it down automatically.

Closes #397
@brianwtaylor brianwtaylor force-pushed the fix/sandbox-creation-readiness-gate branch from 2dd8ab0 to 6a1da48 Compare March 19, 2026 08:51
Runs `nemoclaw onboard` three times without cleanup between runs,
verifying that each subsequent onboard recovers automatically from
stale gateway, port forward, and registry entries left behind.

Regression test for #397.
@brianwtaylor brianwtaylor force-pushed the fix/sandbox-creation-readiness-gate branch from 17e574d to c43c66a Compare March 19, 2026 10:14
@kjw3 kjw3 self-assigned this Mar 19, 2026
@kjw3 kjw3 merged commit 3dafa3a into NVIDIA:main Mar 19, 2026
3 checks passed
Ryuketsukami pushed a commit to Ryuketsukami/NemoClaw that referenced this pull request Mar 24, 2026
…me truncation (NVIDIA#229)

* fix: gate sandbox registration on readiness and surface creation failures

The sandbox create command was piped through awk to deduplicate log
lines. In bash, the exit status of a pipeline is the status of the last
command (awk, always 0), so creation failures were silently swallowed.
NemoClaw then registered a phantom sandbox in ~/.nemoclaw/sandboxes.json
that caused "sandbox not found" on every subsequent connect/status call.

This is the root cause of the WSL2 + Docker Desktop failures reported
in NVIDIA#140 and NVIDIA#152 — sandbox creation fails due to Docker networking
issues, but onboarding completes as if it succeeded.

Three changes:
1. Remove the awk pipe so the real exit code flows through to run()
2. Poll openshell sandbox list for Ready state before registering
   (matches the gateway health check pattern at lines 121-132)
3. Move build-context cleanup before the exit-code check so temp
   files are always cleaned up, even on failure

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>

* fix: use word-boundary match for Ready status and fix timeout

includes("Ready") falsely matched "NotReady" because "Ready" is a
substring. Use a word-boundary regex with a NotReady exclusion so
sandboxes stuck in error states are not registered as healthy.

Also remove the off-by-one break at i=29 so the loop sleeps the
full 60s before timing out.

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>

* fix: exact-match sandbox name in readiness check

Use column-based matching (split on whitespace, check cols[0])
instead of substring includes(). Prevents false positives when
one sandbox name is a prefix of another (e.g. "my" matching
"my-assistant").

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>

* test: add readiness gate parsing tests for sandbox creation

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>

* fix: guard against truncated sandbox names on WSL (fixes NVIDIA#21)

On WSL, hyphenated sandbox names like "my-assistant" can be truncated
to "m" during shell argument parsing, causing "sandbox not found"
failures when applying policy presets.

- Add RFC 1123 validation in applyPreset() to catch truncated names
  early with a clear error message
- Quote sandbox name in error output (was unquoted on line 356)
- Add 6 WSL-specific regression tests covering hyphenated names,
  multi-hyphen names, truncation detection, and command quoting

* fix: clean up orphaned sandbox on readiness timeout

When the sandbox is created but never reaches Ready within 60s,
delete it before exiting so the next onboard retry with the same
name doesn't fail on "sandbox already exists".

* chore: remove issue references from code comments

* fix: enforce RFC 1123 63-char limit in sandbox name validation

* fix: extract isSandboxReady as shared function and branch cleanup messaging

- Extract readiness parser from inline code to exported isSandboxReady()
  so tests validate the production code, not a duplicated copy
- Branch cleanup messaging on delete result — report manual cleanup
  command if the orphan delete fails

* fix: clean up stale gateway and port forward before preflight check

A previous onboard session may leave the OpenShell gateway container
and port forward running, causing port 8080/18789 conflicts on the
next invocation. Detect a NemoClaw-owned gateway before the port
availability check and tear it down automatically.

Closes NVIDIA#397

* test: add double-onboard e2e test for stale state recovery

Runs `nemoclaw onboard` three times without cleanup between runs,
verifying that each subsequent onboard recovers automatically from
stale gateway, port forward, and registry entries left behind.

Regression test for NVIDIA#397.

---------

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
jessesanford pushed a commit to jessesanford/NemoClaw that referenced this pull request Mar 24, 2026
…me truncation (NVIDIA#229)

* fix: gate sandbox registration on readiness and surface creation failures

The sandbox create command was piped through awk to deduplicate log
lines. In bash, the exit status of a pipeline is the status of the last
command (awk, always 0), so creation failures were silently swallowed.
NemoClaw then registered a phantom sandbox in ~/.nemoclaw/sandboxes.json
that caused "sandbox not found" on every subsequent connect/status call.

This is the root cause of the WSL2 + Docker Desktop failures reported
in NVIDIA#140 and NVIDIA#152 — sandbox creation fails due to Docker networking
issues, but onboarding completes as if it succeeded.

Three changes:
1. Remove the awk pipe so the real exit code flows through to run()
2. Poll openshell sandbox list for Ready state before registering
   (matches the gateway health check pattern at lines 121-132)
3. Move build-context cleanup before the exit-code check so temp
   files are always cleaned up, even on failure

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>

* fix: use word-boundary match for Ready status and fix timeout

includes("Ready") falsely matched "NotReady" because "Ready" is a
substring. Use a word-boundary regex with a NotReady exclusion so
sandboxes stuck in error states are not registered as healthy.

Also remove the off-by-one break at i=29 so the loop sleeps the
full 60s before timing out.

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>

* fix: exact-match sandbox name in readiness check

Use column-based matching (split on whitespace, check cols[0])
instead of substring includes(). Prevents false positives when
one sandbox name is a prefix of another (e.g. "my" matching
"my-assistant").

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>

* test: add readiness gate parsing tests for sandbox creation

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>

* fix: guard against truncated sandbox names on WSL (fixes NVIDIA#21)

On WSL, hyphenated sandbox names like "my-assistant" can be truncated
to "m" during shell argument parsing, causing "sandbox not found"
failures when applying policy presets.

- Add RFC 1123 validation in applyPreset() to catch truncated names
  early with a clear error message
- Quote sandbox name in error output (was unquoted on line 356)
- Add 6 WSL-specific regression tests covering hyphenated names,
  multi-hyphen names, truncation detection, and command quoting

* fix: clean up orphaned sandbox on readiness timeout

When the sandbox is created but never reaches Ready within 60s,
delete it before exiting so the next onboard retry with the same
name doesn't fail on "sandbox already exists".

* chore: remove issue references from code comments

* fix: enforce RFC 1123 63-char limit in sandbox name validation

* fix: extract isSandboxReady as shared function and branch cleanup messaging

- Extract readiness parser from inline code to exported isSandboxReady()
  so tests validate the production code, not a duplicated copy
- Branch cleanup messaging on delete result — report manual cleanup
  command if the orphan delete fails

* fix: clean up stale gateway and port forward before preflight check

A previous onboard session may leave the OpenShell gateway container
and port forward running, causing port 8080/18789 conflicts on the
next invocation. Detect a NemoClaw-owned gateway before the port
availability check and tear it down automatically.

Closes NVIDIA#397

* test: add double-onboard e2e test for stale state recovery

Runs `nemoclaw onboard` three times without cleanup between runs,
verifying that each subsequent onboard recovers automatically from
stale gateway, port forward, and registry entries left behind.

Regression test for NVIDIA#397.

---------

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
mafueee pushed a commit to mafueee/NemoClaw that referenced this pull request Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment