feat(cli): add active session detection before destructive sandbox operations#2117
feat(cli): add active session detection before destructive sandbox operations#2117
Conversation
…erations Adds a shared sandbox-session-state.ts module that detects active SSH connections to OpenShell sandboxes via pgrep and openshell forward list. Consumers: - sandboxDestroy(): warns before destroying sandbox with active sessions - sandboxRebuild(): warns before rebuilding sandbox with active sessions - sandboxConnect(): notes if already connected in another terminal - sandboxStatus(): shows Connected: yes/no with session count - nemoclaw list: shows ● indicator next to connected sandboxes - cleanupGatewayAfterLastSandbox(): notes active port forwards The module follows the gateway-state.ts pattern: pure typed classifiers separated from the I/O layer with dependency injection for testability. All guards are non-blocking (try/catch with graceful fallback). Closes #992
📝 WalkthroughWalkthroughAdds sandbox SSH-session detection: parses OpenShell forward lists and system SSH processes, classifies active sessions, and integrates detection into listing, status, connect, destroy, and rebuild flows to show indicators and warn before destructive operations. Changes
Sequence DiagramsequenceDiagram
participant User as User/CLI
participant CLI as nemoclaw
participant SessionDetection as Session Detection
participant OpenShell as OpenShell CLI
participant System as System (ps/pgrep)
User->>CLI: Run command (list/status/connect/destroy/rebuild)
CLI->>SessionDetection: getActiveSandboxSessions(sandboxName) / probe once for list
SessionDetection->>OpenShell: spawnSync("openshell forward list")
OpenShell-->>SessionDetection: forward list output
SessionDetection->>System: spawnSync("ps -axo pid,command") or pgrep
System-->>SessionDetection: SSH processes output
SessionDetection->>SessionDetection: parseForwardList()<br/>parseSshProcesses()<br/>classifySessionState()
SessionDetection-->>CLI: { detected, sessions, sessionCount }
alt Active Sessions Detected
CLI->>User: Show indicator/warning and require confirmation for destructive ops
else No Active Sessions
CLI->>User: Proceed normally
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/sandbox-session-state.ts`:
- Around line 113-117: The current check uses cmdline.includes(sshHost) which
matches substrings and can false-positively tie "openshell-my-sandbox-extended"
to "my-sandbox"; change the match to check argv tokens instead: parse the
command line string (cmdline) into argument tokens and test whether any token
=== sshHost (e.g., tokens.some(t => t === sshHost)) before pushing the session
into sessions (referencing cmdline, sshHost, sessions, sandboxName, pid). Ensure
splitting handles typical whitespace so only exact argument matches are
accepted.
- Around line 201-214: The code currently sets detected: true whenever
forwardOutput exists even if getSshProcesses() returned null; change the logic
so detection only succeeds when the SSH probe result (pgrepOutput from
getSshProcesses()) is available. Specifically, update the block around
forwardOutput and pgrepOutput to return {detected: false, sessions: []} if
pgrepOutput === null (even if forwardOutput is non-null), and only call
parseSshProcesses(pgrepOutput, sandboxName) and return detected: true when
pgrepOutput is non-null and parsing yields sessions.
In `@src/nemoclaw.ts`:
- Around line 1171-1187: The getActiveSessionCount callback currently shells out
via getActiveSandboxSessions for every sandbox row causing repeated probes;
instead, when sessionDeps is truthy (after resolveOpenshell/createSessionDeps),
run the probe once and cache its results (e.g., compute a map from sandbox name
to detected/session count by calling getActiveSandboxSessions over the full
inventory or by invoking the underlying openshell/pgrep probe once) and have
getActiveSessionCount simply return values from that cache; update the code
around resolveOpenshell, createSessionDeps, sessionDeps, and the
getActiveSessionCount passed into listSandboxesCommand to build and reuse the
cached probe results (and keep the existing try/catch behavior when reading the
cache).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: fc67d604-0fc4-4f60-9a95-a3cae24f989c
📒 Files selected for processing (4)
src/lib/inventory-commands.tssrc/lib/sandbox-session-state.test.tssrc/lib/sandbox-session-state.tssrc/nemoclaw.ts
- Remove unused getForwardList() call from getActiveSandboxSessions — only pgrep/ps is needed for SSH session detection (warning #1) - Consolidate double-prompt in sandboxDestroy into single enriched confirmation prompt (warning #2) - Remove noisy cleanupGatewayAfterLastSandbox forward check that would always fire due to dashboard forward (warning #3) - Use word-boundary regex in parseSshProcesses to prevent false positives when sandbox names share prefixes (warning #4) - Export SessionClassification as named interface (suggestion #1) - Use cross-platform ps -axo instead of Linux-only pgrep -a for macOS compatibility (suggestion #2) - Add forwardCount to SessionClassification for future consumers - Add tests for word-boundary matching edge cases
Address CodeRabbit review: getActiveSessionCount was calling getActiveSandboxSessions (which spawns ps) once per sandbox row. Now the SSH process probe runs once and results are reused for all sandboxes via parseSshProcesses against cached output.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/lib/sandbox-session-state.ts (1)
104-121:⚠️ Potential issue | 🟠 MajorHandle
user@openshell-<sandbox>targets in SSH process matching.The current host matcher misses valid SSH invocations like
ssh user@openshell-my-sandbox, so active sessions can be undercounted and destructive warnings may be skipped.Suggested fix
- const sshHost = `openshell-${sandboxName}`; - // Match sshHost as a complete word — preceded by whitespace/start and followed - // by whitespace/end. This prevents `openshell-dev` from matching inside - // `openshell-dev-staging`. - const hostPattern = new RegExp(`(?:^|\\s)${escapeRegExp(sshHost)}(?:\\s|$)`); + const sshHost = `openshell-${sandboxName}`; const sessions: SandboxSession[] = []; const lines = pgrepOutput.split("\n").filter(Boolean); @@ - const cmdline = pidMatch[2]; - - if (hostPattern.test(cmdline)) { + const cmdline = pidMatch[2]; + const argv = cmdline.trim().split(/\s+/); + const matchesHost = argv.some( + (token) => token === sshHost || token.endsWith(`@${sshHost}`), + ); + if (matchesHost) { sessions.push({ sandboxName, pid, sshHost }); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/sandbox-session-state.ts` around lines 104 - 121, The hostPattern used to detect SSH processes misses targets of the form user@openshell-<sandbox>; update the regex construction around sshHost (where hostPattern and escapeRegExp are created) to allow an optional user@ prefix (and optionally SSH URI prefixes like ssh://) when testing cmdline, so that lines like "ssh user@openshell-my-sandbox" are matched and sessions.push({ sandboxName, pid, sshHost }) runs for those processes; adjust the RegExp used in hostPattern accordingly and keep the rest of the pid parsing (pidMatch, pid, cmdline) intact.
🧹 Nitpick comments (1)
src/lib/sandbox-session-state.test.ts (1)
70-131: Add a regression test forssh user@openshell-<sandbox>format.Please add a parser case for
user@hosttargets so session detection stays correct for common SSH syntax and future refactors.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/sandbox-session-state.test.ts` around lines 70 - 131, The tests lack a case for SSH targets using the user@host form (e.g. "ssh user@openshell-<sandbox>"); add a unit test in sandbox-session-state.test.ts calling parseSshProcesses with output containing a line like "PID ssh user@openshell-my-sandbox" and asserting it returns a session with pid and sshHost "openshell-my-sandbox". If parseSshProcesses’ current regex doesn’t handle the "user@" prefix, update its matching logic (in the parseSshProcesses function) to allow an optional "user@" before the host (e.g. match optional \S+@ and then the openshell-<sandbox> host or split on '@' to extract the host) so the parser correctly detects user@host targets.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/sandbox-session-state.ts`:
- Around line 204-215: The comment about detecting active SSH sessions
references `pgrep -a ssh` but the implementation uses `ps -axo pid,command`;
update the doc comment in the `getSshProcesses`/session detection block (the
high-level function that "Detect active SSH sessions for a named sandbox") to
accurately describe that we parse `ps -axo pid,command` output (and the matching
logic against the sandbox SSH host) instead of `pgrep`, so future readers and
maintainers see the correct system command and parsing behavior.
---
Duplicate comments:
In `@src/lib/sandbox-session-state.ts`:
- Around line 104-121: The hostPattern used to detect SSH processes misses
targets of the form user@openshell-<sandbox>; update the regex construction
around sshHost (where hostPattern and escapeRegExp are created) to allow an
optional user@ prefix (and optionally SSH URI prefixes like ssh://) when testing
cmdline, so that lines like "ssh user@openshell-my-sandbox" are matched and
sessions.push({ sandboxName, pid, sshHost }) runs for those processes; adjust
the RegExp used in hostPattern accordingly and keep the rest of the pid parsing
(pidMatch, pid, cmdline) intact.
---
Nitpick comments:
In `@src/lib/sandbox-session-state.test.ts`:
- Around line 70-131: The tests lack a case for SSH targets using the user@host
form (e.g. "ssh user@openshell-<sandbox>"); add a unit test in
sandbox-session-state.test.ts calling parseSshProcesses with output containing a
line like "PID ssh user@openshell-my-sandbox" and asserting it returns a session
with pid and sshHost "openshell-my-sandbox". If parseSshProcesses’ current regex
doesn’t handle the "user@" prefix, update its matching logic (in the
parseSshProcesses function) to allow an optional "user@" before the host (e.g.
match optional \S+@ and then the openshell-<sandbox> host or split on '@' to
extract the host) so the parser correctly detects user@host targets.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 0c430f29-add2-4eb7-975b-c73ce22f5c2d
📒 Files selected for processing (3)
src/lib/sandbox-session-state.test.tssrc/lib/sandbox-session-state.tssrc/nemoclaw.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/nemoclaw.ts
| /** Run `pgrep -a ssh` and return stdout. Null if unavailable. */ | ||
| getSshProcesses: () => string | null; | ||
| } | ||
|
|
||
| /** | ||
| * Detect active SSH sessions for a named sandbox. | ||
| * | ||
| * This is the high-level entry point used by consumers (destroy, rebuild, etc.). | ||
| * It invokes system commands through the deps interface for testability. | ||
| * | ||
| * Detection relies on `pgrep -a ssh` to find SSH processes targeting the | ||
| * sandbox's SSH host. The `getForwardList` dep is not used here (forward |
There was a problem hiding this comment.
Update stale probe documentation (pgrep vs ps).
Comments still say pgrep -a ssh, but the implementation uses ps -axo pid,command. This mismatch can mislead future maintenance/debugging.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/sandbox-session-state.ts` around lines 204 - 215, The comment about
detecting active SSH sessions references `pgrep -a ssh` but the implementation
uses `ps -axo pid,command`; update the doc comment in the
`getSshProcesses`/session detection block (the high-level function that "Detect
active SSH sessions for a named sandbox") to accurately describe that we parse
`ps -axo pid,command` output (and the matching logic against the sandbox SSH
host) instead of `pgrep`, so future readers and maintainers see the correct
system command and parsing behavior.
## Summary Catches up the user-facing reference and troubleshooting docs with the CLI and policy behavior changes that landed in v0.0.21. Drafted via the `nemoclaw-contributor-update-docs` skill against commits in `v0.0.20..v0.0.21`, filtered through `docs/.docs-skip`. ## Changes - **`docs/reference/commands.md`** - `nemoclaw list`: session indicator (●) for connected sandboxes (#2117). - `nemoclaw <name> connect`: active-session note; auto-recovery from SSH identity drift after a host reboot (#2117, #2064). - `nemoclaw <name> status`: three-state Inference line (`healthy` / `unreachable` / `not probed`) covering both local and remote providers; new `Connected` line (#2002, #2117). - `nemoclaw <name> destroy` and `rebuild`: active-session warning with second confirm; rebuild reapplies policy presets to the recreated sandbox (#2117, #2026). - `nemoclaw <name> policy-add` and `policy-remove`: positional preset argument and non-interactive flow via `--yes`/`--force`/`NEMOCLAW_NON_INTERACTIVE=1` (#2070). - `nemoclaw <name> policy-list`: registry-vs-gateway desync detection (#2089). - **`docs/reference/troubleshooting.md`** - `Reconnect after a host reboot`: now reflects automatic stale `known_hosts` pruning on `connect` (#2064). - `Running multiple sandboxes simultaneously`: onboard's forward-port collision guard (#2086). - New section: `openclaw config set` or `unset` is blocked inside the sandbox (#2081). - **`docs/network-policy/customize-network-policy.md`**: non-interactive `policy-add`/`policy-remove` form; preset preservation across rebuild (#2070, #2026). - **`docs/inference/use-local-inference.md`**: NIM section now covers the NGC API key prompt with masked input and `docker login nvcr.io --password-stdin` behavior (#2043). - **Generated skills regenerated** to pick up the source changes (`.agents/skills/nemoclaw-user-reference/references/{commands,troubleshooting}.md`, plus minor heading-flow deltas elsewhere). The pre-commit `Regenerate agent skills from docs` hook ran and confirmed source ↔ generated parity. Commits skipped per `docs/.docs-skip` or no doc impact: `bbbaa0fb` (skip-features), `7cb482cb` (skip-features), `8dee23fd` (skip-terms), plus the usual CI / test / refactor / install-plumbing churn. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes for the modified files (the one failing test, `test/cli.test.ts > unknown command exits 1`, also fails on `origin/main` and is unrelated to these markdown-only changes) - [ ] `npm test` passes — skipped; same pre-existing CLI-dispatch test failure unrelated to docs - [ ] Tests added or updated for new or changed behavior — n/a, doc-only - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) — not run locally - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) — n/a, no new pages ## AI Disclosure - [x] AI-assisted — tool: Claude Code --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Multi-session SSH connections with concurrent session support. * Three-state inference health reporting (healthy/unreachable/not probed) across all providers. * Automatic SSH host key rotation detection and recovery. * Non-interactive policy preset management via positional arguments. * Session indicators in sandbox list view. * **Bug Fixes** * Protected destructive operations with active-session warnings. * Policy presets now preserved during sandbox rebuilds. * **Documentation** * NGC registry authentication requirements for container images. * Multi-sandbox onboarding and reconnection guidance. * Troubleshooting updates for port conflicts and SSH issues. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Adds a shared
sandbox-session-state.tsmodule that detects active SSH connections to OpenShell sandboxes before destructive operations proceed. This addresses the core issue in #992: no destructive operation in the CLI checks for active SSH sessions before proceeding, leading to abrupt Broken pipe errors with no graceful warning.Architectural Context
This is part of the broader onboard.ts decomposition effort and JS→TS migration arc (see #2113 for the type-safety direction). The fix creates reusable infrastructure that improves destroy, rebuild, status, list, and connect.
The module follows the
gateway-state.tspattern: pure typed classifiers that parse CLI output are separated from the I/O layer with dependency injection for testability.Changes
src/lib/sandbox-session-state.ts— ESM TypeScript, no@ts-nocheck, explicit types, dependency-injected I/OparseForwardList()— parsesopenshell forward listoutputparseSshProcesses()— parsespgrep -a sshoutputclassifySessionState()— combines both sourcesgetActiveSandboxSessions()— high-level entry pointcreateSystemDeps()— factory for real system depssrc/nemoclaw.ts— wires session detection into 6 consumers:sandboxDestroy():sandboxRebuild():cleanupGatewayAfterLastSandbox(): Note about active port forwardssandboxConnect(): "Note: existing session detected (another terminal)"sandboxStatus(): Shows "Connected: yes (N sessions)" / "Connected: no"nemoclaw list: ● indicator next to connected sandboxessrc/lib/inventory-commands.ts— adds optionalgetActiveSessionCounttoListSandboxesCommandDepsType of Change
Test plan
src/lib/sandbox-session-state.test.tscovering all pure classifiers and integration functionnpx vitest run --project clipasses (existing tests unaffected)npx tsc -p tsconfig.src.json --noEmitcleanVerification
npm testpasses (pre-existing failure in install-preflight.test.ts unrelated)AI Disclosure
Closes #992
Summary by CodeRabbit
New Features
Tests