fix: reduce auto-pair timeout and add skip/configure env vars#145
fix: reduce auto-pair timeout and add skip/configure env vars#145brianwtaylor wants to merge 6 commits intoNVIDIA:mainfrom
Conversation
a007b7c to
39b1fe3
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds env-driven control to the auto-pair watcher in Changes
Sequence Diagram(s)sequenceDiagram
participant Shell as Shell script
participant Watcher as Inline Python watcher
participant Subproc as Subprocess (kubectl/cli)
Shell->>Watcher: start (unless NEMOCLAW_SKIP_AUTO_PAIR set)
Note right of Watcher: compute DEADLINE from NEMOCLAW_AUTO_PAIR_TIMEOUT
Watcher->>Subproc: run(cmd, timeout=remaining)
Subproc-->>Watcher: exit code / output (or timeout -> code 124)
Watcher->>Shell: log status (startup, attempts, timeout/final message)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 138-141: The run function currently calls subprocess.run without a
timeout, so long-running commands in the watcher loop (e.g., the "openclaw
devices list" and "approve" invocations) can block past the overall watcher
deadline; update run to accept a timeout argument (e.g., def run(*args,
timeout=None)) and pass that into subprocess.run(timeout=timeout,
capture_output=True, text=True); in the watcher loop compute the remaining time
from the watcher deadline and pass it as the timeout when calling run for those
commands, and catch subprocess.TimeoutExpired inside run to return a non-zero
returncode and a stderr message indicating the timeout so the watcher can handle
the failure and respect its deadline.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 13bc8690-9e64-4a46-9cb4-b19f4ee1e819
📒 Files selected for processing (1)
scripts/nemoclaw-start.sh
64796be to
3fd5f2c
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/nemoclaw-start.sh (1)
84-110:⚠️ Potential issue | 🟠 MajorAvoid logging raw auth tokens in printed dashboard URLs.
When token exists, Lines 105–110 output tokenized URLs directly. This can leak bearer material to terminal logs/CI logs. Prefer redacting by default and requiring explicit opt-in to print tokenized links.
Proposed fix
- if [ -n "$token" ]; then - local_url="${local_url}#token=${token}" - remote_url="${remote_url}#token=${token}" - fi + if [ -n "$token" ]; then + if [ "${NEMOCLAW_PRINT_TOKENIZED_URLS:-0}" = "1" ]; then + local_url="${local_url}#token=${token}" + remote_url="${remote_url}#token=${token}" + else + echo "[gateway] auth token detected; tokenized URLs suppressed (set NEMOCLAW_PRINT_TOKENIZED_URLS=1 to show)" + fi + fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/nemoclaw-start.sh` around lines 84 - 110, The print_dashboard_urls function currently appends the raw auth token to local_url and remote_url (variables token, local_url, remote_url) and prints them, which can leak secrets; change this to redact the token by default (e.g., append "#token=REDACTED" or omit the fragment) and add an explicit opt-in check (environment variable like SHOW_TOKENS or a flag) before appending and printing the real tokenized links; update the echo lines to print the redacted URLs by default and only print the full tokenized URLs when the opt-in variable is truthy.
♻️ Duplicate comments (1)
scripts/nemoclaw-start.sh (1)
138-145:⚠️ Potential issue | 🟠 MajorSubprocess calls can hang past the watcher deadline.
At Line 140,
subprocess.run(...)has no timeout. The loop deadline is checked only between iterations, soopenclaw devices list/approvecan block indefinitely and defeat the timeout behavior.Proposed fix
def run(*args): """Run a subprocess and return (returncode, stdout, stderr).""" - proc = subprocess.run(args, capture_output=True, text=True) - return proc.returncode, proc.stdout.strip(), proc.stderr.strip() + remaining = max(1, DEADLINE - time.time()) + try: + proc = subprocess.run(args, capture_output=True, text=True, timeout=remaining) + return proc.returncode, proc.stdout.strip(), proc.stderr.strip() + except subprocess.TimeoutExpired: + return 124, '', f'subprocess timed out after {int(remaining)}s'#!/bin/bash set -euo pipefail file="$(fd -t f 'nemoclaw-start\.sh$' | head -n1)" echo "Inspecting: $file" echo "--- run helper and subprocess.run usage ---" rg -n -C2 -P 'def run\(|subprocess\.run\(' "$file" echo "--- watcher call sites ---" rg -n -C2 -P "run\('openclaw', 'devices', '(list|approve)'" "$file" echo "--- source window ---" nl -ba "$file" | sed -n '136,172p'Also applies to: 164-170
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/nemoclaw-start.sh` around lines 138 - 145, The helper run function currently calls subprocess.run without a timeout, which allows commands like the openclaw invocations to hang past the loop DEADLINE; update the run function to accept an optional timeout parameter (or compute remaining time from DEADLINE) and pass it to subprocess.run(timeout=...), raising or returning a non-zero rc on timeout, and then call run for the 'openclaw devices list' and 'openclaw devices approve' sites with the remaining time or a safe per-call timeout so the loop's deadline is honored; reference the run function, subprocess.run call, the DEADLINE variable, and the locations invoking run('openclaw', 'devices', 'list', '--json') and run('openclaw', 'devices', 'approve', ...) when making the changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 84-110: The print_dashboard_urls function currently appends the
raw auth token to local_url and remote_url (variables token, local_url,
remote_url) and prints them, which can leak secrets; change this to redact the
token by default (e.g., append "#token=REDACTED" or omit the fragment) and add
an explicit opt-in check (environment variable like SHOW_TOKENS or a flag)
before appending and printing the real tokenized links; update the echo lines to
print the redacted URLs by default and only print the full tokenized URLs when
the opt-in variable is truthy.
---
Duplicate comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 138-145: The helper run function currently calls subprocess.run
without a timeout, which allows commands like the openclaw invocations to hang
past the loop DEADLINE; update the run function to accept an optional timeout
parameter (or compute remaining time from DEADLINE) and pass it to
subprocess.run(timeout=...), raising or returning a non-zero rc on timeout, and
then call run for the 'openclaw devices list' and 'openclaw devices approve'
sites with the remaining time or a safe per-call timeout so the loop's deadline
is honored; reference the run function, subprocess.run call, the DEADLINE
variable, and the locations invoking run('openclaw', 'devices', 'list',
'--json') and run('openclaw', 'devices', 'approve', ...) when making the
changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6cd1c4b3-bd5b-4df8-ac30-633952a137a0
📒 Files selected for processing (1)
scripts/nemoclaw-start.sh
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/nemoclaw-start.sh (1)
104-110:⚠️ Potential issue | 🟠 MajorDon't print tokenized dashboard URLs.
Appending
#token=${token}here leaks the gateway auth token into terminal/CI logs. That widens the same insecure auth surface this PR is trying to constrain. Log the base URLs only, or make tokenized links an explicit opt-in.🔐 Minimal fix
remote_url="${chat_ui_base}/" if [ -n "$token" ]; then - local_url="${local_url}#token=${token}" - remote_url="${remote_url}#token=${token}" + echo "[gateway] Auth token present; omitting tokenized URLs from logs" fi echo "[gateway] Local UI: ${local_url}" echo "[gateway] Remote UI: ${remote_url}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/nemoclaw-start.sh` around lines 104 - 110, The script currently appends "#token=${token}" to local_url and remote_url and then echoes them, leaking the gateway auth token; change it so the echoed URLs are the base local_url and remote_url (do not include ${token}) and, if you still need tokenized links internally, keep tokenized values in separate variables (e.g., token_local_url/token_remote_url) and only echo them when an explicit opt-in flag (e.g., SHOW_TOKEN_LINKS) is set; update the code paths that reference local_url/remote_url to use the un-tokenized variables for logging while preserving any internal use of ${token} where necessary.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 121-136: The printed timeout value (from reading
NEMOCLAW_AUTO_PAIR_TIMEOUT and stored in timeout) is hidden because the watcher
is started via nohup redirecting stdout/stderr; move or duplicate the observable
message into the shell before launching the background Python block (i.e., emit
the normalized timeout from the shell environment prior to the nohup line) so
users see the resolved timeout on the CLI; reference the nohup invocation and
the Python print(f'[auto-pair] watcher launched (timeout={timeout}s)') as the
location to change and use NEMOCLAW_AUTO_PAIR_TIMEOUT to compute/display the
same normalized timeout in the shell.
---
Outside diff comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 104-110: The script currently appends "#token=${token}" to
local_url and remote_url and then echoes them, leaking the gateway auth token;
change it so the echoed URLs are the base local_url and remote_url (do not
include ${token}) and, if you still need tokenized links internally, keep
tokenized values in separate variables (e.g., token_local_url/token_remote_url)
and only echo them when an explicit opt-in flag (e.g., SHOW_TOKEN_LINKS) is set;
update the code paths that reference local_url/remote_url to use the
un-tokenized variables for logging while preserving any internal use of ${token}
where necessary.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ca0cadf5-c717-447c-b7af-1047e76f1482
📒 Files selected for processing (1)
scripts/nemoclaw-start.sh
Reduce default auto-pair watcher timeout from 600s to 120s (pairing typically converges in <10s). Add NEMOCLAW_AUTO_PAIR_TIMEOUT env var for custom timeout and NEMOCLAW_SKIP_AUTO_PAIR to disable the watcher entirely in automated deployments. Signed-off-by: Brian Taylor <brian@briantaylor.xyz> Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
Add shell-style doc comments to all four bash functions (fix_openclaw_config, write_auth_profile, print_dashboard_urls, start_auto_pair) to meet the 80% docstring coverage requirement. Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
Without a per-call timeout, a hung openclaw subprocess could block indefinitely and defeat the configurable DEADLINE. Pass the remaining time budget to subprocess.run() and return rc=124 on expiry.
- Treat timeout <= 0 as invalid (fall back to 120s default) since timeout=0 would cause the watcher to exit immediately with no work. - Lowercase NEMOCLAW_SKIP_AUTO_PAIR before matching so True, YES, tRuE etc. all work as expected.
3948f84 to
5cd4806
Compare
|
Thanks for reducing the default auto-pair watcher timeout and adding environment variables to configure the timeout, this should make the pairing process more efficient for users. |
|
needed to refork. |
…tion (NVIDIA#145) * fix(server): add field-level size limits to sandbox and provider creation Closes NVIDIA#24 Add validate_sandbox_spec and provider field validation with named constants. Configure explicit 1MB tonic max_decoding_message_size. Inference routes excluded per NVIDIA#133 rearchitecture. * chore: remove issue number references from code comments --------- Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
Summary
NEMOCLAW_AUTO_PAIR_TIMEOUTenv var to configure the timeout for environments that need longer convergence windowsNEMOCLAW_SKIP_AUTO_PAIRenv var to disable the watcher entirely in automated/CI deploymentsopenclaw devicescalls from defeating the watcher deadlineTest plan
Automated Tests
Manual Verification