Skip to content

fix: reduce auto-pair timeout and add skip/configure env vars#145

Closed
brianwtaylor wants to merge 6 commits intoNVIDIA:mainfrom
brianwtaylor:fix/auto-pair-timeout-hardening
Closed

fix: reduce auto-pair timeout and add skip/configure env vars#145
brianwtaylor wants to merge 6 commits intoNVIDIA:mainfrom
brianwtaylor:fix/auto-pair-timeout-hardening

Conversation

@brianwtaylor
Copy link
Copy Markdown
Contributor

@brianwtaylor brianwtaylor commented Mar 17, 2026

Summary

  • Reduce default auto-pair watcher timeout from 600s to 120s — pairing typically converges in <10s, 10 minutes is unnecessarily long
  • Add NEMOCLAW_AUTO_PAIR_TIMEOUT env var to configure the timeout for environments that need longer convergence windows
  • Add NEMOCLAW_SKIP_AUTO_PAIR env var to disable the watcher entirely in automated/CI deployments
  • Add per-subprocess timeout to prevent hung openclaw devices calls from defeating the watcher deadline
  • Echo configured timeout to the terminal before the watcher launches in the background

Test plan

Automated Tests

npm test

Manual Verification

# Default behavior (120s timeout)
nemoclaw onboard
# Expect log: auto-pair watcher launching (timeout=120s)

# Custom timeout
NEMOCLAW_AUTO_PAIR_TIMEOUT=300 nemoclaw onboard

# Skip auto-pair
NEMOCLAW_SKIP_AUTO_PAIR=1 nemoclaw onboard
# Expect log: auto-pair watcher skipped

@brianwtaylor brianwtaylor force-pushed the fix/auto-pair-timeout-hardening branch 2 times, most recently from a007b7c to 39b1fe3 Compare March 18, 2026 00:08
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds env-driven control to the auto-pair watcher in scripts/nemoclaw-start.sh: NEMOCLAW_SKIP_AUTO_PAIR can skip startup and NEMOCLAW_AUTO_PAIR_TIMEOUT sets a validated timeout (default 120s). The inline Python watcher uses the timeout for a dynamic DEADLINE, timeout-aware subprocess runs, and startup logging.

Changes

Cohort / File(s) Summary
Auto-pair watcher script
scripts/nemoclaw-start.sh
Add NEMOCLAW_SKIP_AUTO_PAIR to optionally skip the watcher; add NEMOCLAW_AUTO_PAIR_TIMEOUT with validation and default (120s); replace fixed 600s DEADLINE with dynamic timeout-derived DEADLINE; add inline Python run(...) helper to run subprocesses with timeout=remaining and return 124 on timeout; emit watcher startup log with effective timeout; preserve approval-counter and final timeout/failure messaging; print dashboard URLs including token when present.

Sequence Diagram(s)

sequenceDiagram
    participant Shell as Shell script
    participant Watcher as Inline Python watcher
    participant Subproc as Subprocess (kubectl/cli)
    Shell->>Watcher: start (unless NEMOCLAW_SKIP_AUTO_PAIR set)
    Note right of Watcher: compute DEADLINE from NEMOCLAW_AUTO_PAIR_TIMEOUT
    Watcher->>Subproc: run(cmd, timeout=remaining)
    Subproc-->>Watcher: exit code / output (or timeout -> code 124)
    Watcher->>Shell: log status (startup, attempts, timeout/final message)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I sniff the env, the timeout tuned,

A watcher hops, its timer cued,
Skip the chase or let it play,
Subprocess races mark the day,
— 🥕 from the rabbit.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the main changes: reducing auto-pair timeout from 600s to 120s and adding environment variables (NEMOCLAW_SKIP_AUTO_PAIR and NEMOCLAW_AUTO_PAIR_TIMEOUT) to control the watcher behavior.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 138-141: The run function currently calls subprocess.run without a
timeout, so long-running commands in the watcher loop (e.g., the "openclaw
devices list" and "approve" invocations) can block past the overall watcher
deadline; update run to accept a timeout argument (e.g., def run(*args,
timeout=None)) and pass that into subprocess.run(timeout=timeout,
capture_output=True, text=True); in the watcher loop compute the remaining time
from the watcher deadline and pass it as the timeout when calling run for those
commands, and catch subprocess.TimeoutExpired inside run to return a non-zero
returncode and a stderr message indicating the timeout so the watcher can handle
the failure and respect its deadline.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 13bc8690-9e64-4a46-9cb4-b19f4ee1e819

📥 Commits

Reviewing files that changed from the base of the PR and between 39b1fe3 and 64796be.

📒 Files selected for processing (1)
  • scripts/nemoclaw-start.sh

@brianwtaylor brianwtaylor force-pushed the fix/auto-pair-timeout-hardening branch from 64796be to 3fd5f2c Compare March 18, 2026 19:44
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/nemoclaw-start.sh (1)

84-110: ⚠️ Potential issue | 🟠 Major

Avoid logging raw auth tokens in printed dashboard URLs.

When token exists, Lines 105–110 output tokenized URLs directly. This can leak bearer material to terminal logs/CI logs. Prefer redacting by default and requiring explicit opt-in to print tokenized links.

Proposed fix
-  if [ -n "$token" ]; then
-    local_url="${local_url}#token=${token}"
-    remote_url="${remote_url}#token=${token}"
-  fi
+  if [ -n "$token" ]; then
+    if [ "${NEMOCLAW_PRINT_TOKENIZED_URLS:-0}" = "1" ]; then
+      local_url="${local_url}#token=${token}"
+      remote_url="${remote_url}#token=${token}"
+    else
+      echo "[gateway] auth token detected; tokenized URLs suppressed (set NEMOCLAW_PRINT_TOKENIZED_URLS=1 to show)"
+    fi
+  fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` around lines 84 - 110, The print_dashboard_urls
function currently appends the raw auth token to local_url and remote_url
(variables token, local_url, remote_url) and prints them, which can leak
secrets; change this to redact the token by default (e.g., append
"#token=REDACTED" or omit the fragment) and add an explicit opt-in check
(environment variable like SHOW_TOKENS or a flag) before appending and printing
the real tokenized links; update the echo lines to print the redacted URLs by
default and only print the full tokenized URLs when the opt-in variable is
truthy.
♻️ Duplicate comments (1)
scripts/nemoclaw-start.sh (1)

138-145: ⚠️ Potential issue | 🟠 Major

Subprocess calls can hang past the watcher deadline.

At Line 140, subprocess.run(...) has no timeout. The loop deadline is checked only between iterations, so openclaw devices list/approve can block indefinitely and defeat the timeout behavior.

Proposed fix
 def run(*args):
     """Run a subprocess and return (returncode, stdout, stderr)."""
-    proc = subprocess.run(args, capture_output=True, text=True)
-    return proc.returncode, proc.stdout.strip(), proc.stderr.strip()
+    remaining = max(1, DEADLINE - time.time())
+    try:
+        proc = subprocess.run(args, capture_output=True, text=True, timeout=remaining)
+        return proc.returncode, proc.stdout.strip(), proc.stderr.strip()
+    except subprocess.TimeoutExpired:
+        return 124, '', f'subprocess timed out after {int(remaining)}s'
#!/bin/bash
set -euo pipefail

file="$(fd -t f 'nemoclaw-start\.sh$' | head -n1)"
echo "Inspecting: $file"

echo "--- run helper and subprocess.run usage ---"
rg -n -C2 -P 'def run\(|subprocess\.run\(' "$file"

echo "--- watcher call sites ---"
rg -n -C2 -P "run\('openclaw', 'devices', '(list|approve)'" "$file"

echo "--- source window ---"
nl -ba "$file" | sed -n '136,172p'

Also applies to: 164-170

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` around lines 138 - 145, The helper run function
currently calls subprocess.run without a timeout, which allows commands like the
openclaw invocations to hang past the loop DEADLINE; update the run function to
accept an optional timeout parameter (or compute remaining time from DEADLINE)
and pass it to subprocess.run(timeout=...), raising or returning a non-zero rc
on timeout, and then call run for the 'openclaw devices list' and 'openclaw
devices approve' sites with the remaining time or a safe per-call timeout so the
loop's deadline is honored; reference the run function, subprocess.run call, the
DEADLINE variable, and the locations invoking run('openclaw', 'devices', 'list',
'--json') and run('openclaw', 'devices', 'approve', ...) when making the
changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 84-110: The print_dashboard_urls function currently appends the
raw auth token to local_url and remote_url (variables token, local_url,
remote_url) and prints them, which can leak secrets; change this to redact the
token by default (e.g., append "#token=REDACTED" or omit the fragment) and add
an explicit opt-in check (environment variable like SHOW_TOKENS or a flag)
before appending and printing the real tokenized links; update the echo lines to
print the redacted URLs by default and only print the full tokenized URLs when
the opt-in variable is truthy.

---

Duplicate comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 138-145: The helper run function currently calls subprocess.run
without a timeout, which allows commands like the openclaw invocations to hang
past the loop DEADLINE; update the run function to accept an optional timeout
parameter (or compute remaining time from DEADLINE) and pass it to
subprocess.run(timeout=...), raising or returning a non-zero rc on timeout, and
then call run for the 'openclaw devices list' and 'openclaw devices approve'
sites with the remaining time or a safe per-call timeout so the loop's deadline
is honored; reference the run function, subprocess.run call, the DEADLINE
variable, and the locations invoking run('openclaw', 'devices', 'list',
'--json') and run('openclaw', 'devices', 'approve', ...) when making the
changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6cd1c4b3-bd5b-4df8-ac30-633952a137a0

📥 Commits

Reviewing files that changed from the base of the PR and between 64796be and 3fd5f2c.

📒 Files selected for processing (1)
  • scripts/nemoclaw-start.sh

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/nemoclaw-start.sh (1)

104-110: ⚠️ Potential issue | 🟠 Major

Don't print tokenized dashboard URLs.

Appending #token=${token} here leaks the gateway auth token into terminal/CI logs. That widens the same insecure auth surface this PR is trying to constrain. Log the base URLs only, or make tokenized links an explicit opt-in.

🔐 Minimal fix
   remote_url="${chat_ui_base}/"
   if [ -n "$token" ]; then
-    local_url="${local_url}#token=${token}"
-    remote_url="${remote_url}#token=${token}"
+    echo "[gateway] Auth token present; omitting tokenized URLs from logs"
   fi

   echo "[gateway] Local UI: ${local_url}"
   echo "[gateway] Remote UI: ${remote_url}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` around lines 104 - 110, The script currently
appends "#token=${token}" to local_url and remote_url and then echoes them,
leaking the gateway auth token; change it so the echoed URLs are the base
local_url and remote_url (do not include ${token}) and, if you still need
tokenized links internally, keep tokenized values in separate variables (e.g.,
token_local_url/token_remote_url) and only echo them when an explicit opt-in
flag (e.g., SHOW_TOKEN_LINKS) is set; update the code paths that reference
local_url/remote_url to use the un-tokenized variables for logging while
preserving any internal use of ${token} where necessary.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 121-136: The printed timeout value (from reading
NEMOCLAW_AUTO_PAIR_TIMEOUT and stored in timeout) is hidden because the watcher
is started via nohup redirecting stdout/stderr; move or duplicate the observable
message into the shell before launching the background Python block (i.e., emit
the normalized timeout from the shell environment prior to the nohup line) so
users see the resolved timeout on the CLI; reference the nohup invocation and
the Python print(f'[auto-pair] watcher launched (timeout={timeout}s)') as the
location to change and use NEMOCLAW_AUTO_PAIR_TIMEOUT to compute/display the
same normalized timeout in the shell.

---

Outside diff comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 104-110: The script currently appends "#token=${token}" to
local_url and remote_url and then echoes them, leaking the gateway auth token;
change it so the echoed URLs are the base local_url and remote_url (do not
include ${token}) and, if you still need tokenized links internally, keep
tokenized values in separate variables (e.g., token_local_url/token_remote_url)
and only echo them when an explicit opt-in flag (e.g., SHOW_TOKEN_LINKS) is set;
update the code paths that reference local_url/remote_url to use the
un-tokenized variables for logging while preserving any internal use of ${token}
where necessary.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ca0cadf5-c717-447c-b7af-1047e76f1482

📥 Commits

Reviewing files that changed from the base of the PR and between 3fd5f2c and 3948f84.

📒 Files selected for processing (1)
  • scripts/nemoclaw-start.sh

Reduce default auto-pair watcher timeout from 600s to 120s (pairing
typically converges in <10s). Add NEMOCLAW_AUTO_PAIR_TIMEOUT env var
for custom timeout and NEMOCLAW_SKIP_AUTO_PAIR to disable the watcher
entirely in automated deployments.

Signed-off-by: Brian Taylor <brian@briantaylor.xyz>
Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
Add shell-style doc comments to all four bash functions
(fix_openclaw_config, write_auth_profile, print_dashboard_urls,
start_auto_pair) to meet the 80% docstring coverage requirement.

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
Without a per-call timeout, a hung openclaw subprocess could block
indefinitely and defeat the configurable DEADLINE. Pass the remaining
time budget to subprocess.run() and return rc=124 on expiry.
- Treat timeout <= 0 as invalid (fall back to 120s default) since
  timeout=0 would cause the watcher to exit immediately with no work.
- Lowercase NEMOCLAW_SKIP_AUTO_PAIR before matching so True, YES,
  tRuE etc. all work as expected.
@brianwtaylor brianwtaylor force-pushed the fix/auto-pair-timeout-hardening branch from 3948f84 to 5cd4806 Compare March 19, 2026 01:12
@brianwtaylor brianwtaylor closed this by deleting the head repository Mar 20, 2026
@wscurran wscurran added the enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. label Mar 20, 2026
@wscurran
Copy link
Copy Markdown
Contributor

Thanks for reducing the default auto-pair watcher timeout and adding environment variables to configure the timeout, this should make the pairing process more efficient for users.

@brianwtaylor
Copy link
Copy Markdown
Contributor Author

needed to refork.

mafueee pushed a commit to mafueee/NemoClaw that referenced this pull request Mar 28, 2026
…tion (NVIDIA#145)

* fix(server): add field-level size limits to sandbox and provider creation

Closes NVIDIA#24

Add validate_sandbox_spec and provider field validation with named
constants. Configure explicit 1MB tonic max_decoding_message_size.
Inference routes excluded per NVIDIA#133 rearchitecture.

* chore: remove issue number references from code comments

---------

Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement: feature Use this label to identify requests for new capabilities in NemoClaw.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants