Skip to content

feat(cli): add active session detection before destructive sandbox operations#2117

Merged
ericksoa merged 6 commits intomainfrom
fix/992-active-session-guard
Apr 21, 2026
Merged

feat(cli): add active session detection before destructive sandbox operations#2117
ericksoa merged 6 commits intomainfrom
fix/992-active-session-guard

Conversation

@jyaunches
Copy link
Copy Markdown
Contributor

@jyaunches jyaunches commented Apr 20, 2026

Summary

Adds a shared sandbox-session-state.ts module that detects active SSH connections to OpenShell sandboxes before destructive operations proceed. This addresses the core issue in #992: no destructive operation in the CLI checks for active SSH sessions before proceeding, leading to abrupt Broken pipe errors with no graceful warning.

Architectural Context

This is part of the broader onboard.ts decomposition effort and JS→TS migration arc (see #2113 for the type-safety direction). The fix creates reusable infrastructure that improves destroy, rebuild, status, list, and connect.

The module follows the gateway-state.ts pattern: pure typed classifiers that parse CLI output are separated from the I/O layer with dependency injection for testability.

Changes

  • New module: src/lib/sandbox-session-state.ts — ESM TypeScript, no @ts-nocheck, explicit types, dependency-injected I/O
    • parseForwardList() — parses openshell forward list output
    • parseSshProcesses() — parses pgrep -a ssh output
    • classifySessionState() — combines both sources
    • getActiveSandboxSessions() — high-level entry point
    • createSystemDeps() — factory for real system deps
  • src/nemoclaw.ts — wires session detection into 6 consumers:
    • sandboxDestroy(): ⚠️ "Active SSH session detected. Destroy anyway?"
    • sandboxRebuild(): ⚠️ Same warning pattern
    • cleanupGatewayAfterLastSandbox(): Note about active port forwards
    • sandboxConnect(): "Note: existing session detected (another terminal)"
    • sandboxStatus(): Shows "Connected: yes (N sessions)" / "Connected: no"
    • nemoclaw list: ● indicator next to connected sandboxes
  • src/lib/inventory-commands.ts — adds optional getActiveSessionCount to ListSandboxesCommandDeps

Type of Change

  • Code change (feature, bug fix, or refactor)

Test plan

  • 27 unit tests in src/lib/sandbox-session-state.test.ts covering all pure classifiers and integration function
  • npx vitest run --project cli passes (existing tests unaffected)
  • npx tsc -p tsconfig.src.json --noEmit clean
  • All guards wrapped in try/catch — if detection fails, commands behave identically to before (zero regression risk)

Verification

  • npm test passes (pre-existing failure in install-preflight.test.ts unrelated)
  • No secrets, API keys, or credentials committed
  • Tests added or updated for new or changed behavior ✅ (27 new tests)

AI Disclosure

  • AI-assisted — tool: pi coding agent

Closes #992

Summary by CodeRabbit

  • New Features

    • Sandbox list shows a connection indicator when active SSH sessions are detected.
    • Status command reports active session counts per sandbox.
    • Connect alerts if a sandbox already has active sessions.
    • Destroy and rebuild prompt warn about active sessions and require confirmation when present.
  • Tests

    • Added comprehensive tests for session detection, parsing, and classification.

…erations

Adds a shared sandbox-session-state.ts module that detects active SSH
connections to OpenShell sandboxes via pgrep and openshell forward list.

Consumers:
- sandboxDestroy(): warns before destroying sandbox with active sessions
- sandboxRebuild(): warns before rebuilding sandbox with active sessions
- sandboxConnect(): notes if already connected in another terminal
- sandboxStatus(): shows Connected: yes/no with session count
- nemoclaw list: shows ● indicator next to connected sandboxes
- cleanupGatewayAfterLastSandbox(): notes active port forwards

The module follows the gateway-state.ts pattern: pure typed classifiers
separated from the I/O layer with dependency injection for testability.
All guards are non-blocking (try/catch with graceful fallback).

Closes #992
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 20, 2026

📝 Walkthrough

Walkthrough

Adds sandbox SSH-session detection: parses OpenShell forward lists and system SSH processes, classifies active sessions, and integrates detection into listing, status, connect, destroy, and rebuild flows to show indicators and warn before destructive operations.

Changes

Cohort / File(s) Summary
Session detection module & tests
src/lib/sandbox-session-state.ts, src/lib/sandbox-session-state.test.ts
New module that parses openshell forward list and ps/pgrep SSH output into ForwardEntry and SandboxSession, classifies session state, exposes getActiveSandboxSessions and createSystemDeps; comprehensive unit tests added.
Inventory/listing integration
src/lib/inventory-commands.ts
Extended ListSandboxesCommandDeps with optional getActiveSessionCount and updated listing output to append a connection marker () when per-sandbox active session count > 0.
CLI command integration
src/nemoclaw.ts
Wired session detection into listSandboxes, sandboxConnect, sandboxStatus, sandboxDestroy, and sandboxRebuild: caches probe output, derives per-sandbox counts, shows warnings/indicators, and makes detection non-fatal on errors.

Sequence Diagram

sequenceDiagram
    participant User as User/CLI
    participant CLI as nemoclaw
    participant SessionDetection as Session Detection
    participant OpenShell as OpenShell CLI
    participant System as System (ps/pgrep)

    User->>CLI: Run command (list/status/connect/destroy/rebuild)
    CLI->>SessionDetection: getActiveSandboxSessions(sandboxName) / probe once for list
    SessionDetection->>OpenShell: spawnSync("openshell forward list")
    OpenShell-->>SessionDetection: forward list output
    SessionDetection->>System: spawnSync("ps -axo pid,command") or pgrep
    System-->>SessionDetection: SSH processes output
    SessionDetection->>SessionDetection: parseForwardList()<br/>parseSshProcesses()<br/>classifySessionState()
    SessionDetection-->>CLI: { detected, sessions, sessionCount }
    alt Active Sessions Detected
        CLI->>User: Show indicator/warning and require confirmation for destructive ops
    else No Active Sessions
        CLI->>User: Proceed normally
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I sniff the logs and lists with care,
I count the hops and tails of air.
Forwards, SSH, I tally right—
A tiny dot to show the light.
Beware before you smash the lair!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the primary change: adding active session detection before destructive sandbox operations, which aligns with the main objectives and code changes.
Linked Issues check ✅ Passed The PR fully addresses issue #992 by implementing session detection logic to warn users before destructive operations (destroy/rebuild) and provide session information in status commands.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the linked issue objectives: session-state detection module, inventory-command extension, and nemoclaw.ts consumer integration for warnings and indicators.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/992-active-session-guard

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/sandbox-session-state.ts`:
- Around line 113-117: The current check uses cmdline.includes(sshHost) which
matches substrings and can false-positively tie "openshell-my-sandbox-extended"
to "my-sandbox"; change the match to check argv tokens instead: parse the
command line string (cmdline) into argument tokens and test whether any token
=== sshHost (e.g., tokens.some(t => t === sshHost)) before pushing the session
into sessions (referencing cmdline, sshHost, sessions, sandboxName, pid). Ensure
splitting handles typical whitespace so only exact argument matches are
accepted.
- Around line 201-214: The code currently sets detected: true whenever
forwardOutput exists even if getSshProcesses() returned null; change the logic
so detection only succeeds when the SSH probe result (pgrepOutput from
getSshProcesses()) is available. Specifically, update the block around
forwardOutput and pgrepOutput to return {detected: false, sessions: []} if
pgrepOutput === null (even if forwardOutput is non-null), and only call
parseSshProcesses(pgrepOutput, sandboxName) and return detected: true when
pgrepOutput is non-null and parsing yields sessions.

In `@src/nemoclaw.ts`:
- Around line 1171-1187: The getActiveSessionCount callback currently shells out
via getActiveSandboxSessions for every sandbox row causing repeated probes;
instead, when sessionDeps is truthy (after resolveOpenshell/createSessionDeps),
run the probe once and cache its results (e.g., compute a map from sandbox name
to detected/session count by calling getActiveSandboxSessions over the full
inventory or by invoking the underlying openshell/pgrep probe once) and have
getActiveSessionCount simply return values from that cache; update the code
around resolveOpenshell, createSessionDeps, sessionDeps, and the
getActiveSessionCount passed into listSandboxesCommand to build and reuse the
cached probe results (and keep the existing try/catch behavior when reading the
cache).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: fc67d604-0fc4-4f60-9a95-a3cae24f989c

📥 Commits

Reviewing files that changed from the base of the PR and between a7dca9c and e8cc275.

📒 Files selected for processing (4)
  • src/lib/inventory-commands.ts
  • src/lib/sandbox-session-state.test.ts
  • src/lib/sandbox-session-state.ts
  • src/nemoclaw.ts

Comment thread src/lib/sandbox-session-state.ts Outdated
Comment thread src/lib/sandbox-session-state.ts Outdated
Comment thread src/nemoclaw.ts
Test User added 2 commits April 20, 2026 17:37
- Remove unused getForwardList() call from getActiveSandboxSessions —
  only pgrep/ps is needed for SSH session detection (warning #1)
- Consolidate double-prompt in sandboxDestroy into single enriched
  confirmation prompt (warning #2)
- Remove noisy cleanupGatewayAfterLastSandbox forward check that would
  always fire due to dashboard forward (warning #3)
- Use word-boundary regex in parseSshProcesses to prevent false positives
  when sandbox names share prefixes (warning #4)
- Export SessionClassification as named interface (suggestion #1)
- Use cross-platform ps -axo instead of Linux-only pgrep -a for macOS
  compatibility (suggestion #2)
- Add forwardCount to SessionClassification for future consumers
- Add tests for word-boundary matching edge cases
Address CodeRabbit review: getActiveSessionCount was calling
getActiveSandboxSessions (which spawns ps) once per sandbox row.
Now the SSH process probe runs once and results are reused for
all sandboxes via parseSshProcesses against cached output.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/lib/sandbox-session-state.ts (1)

104-121: ⚠️ Potential issue | 🟠 Major

Handle user@openshell-<sandbox> targets in SSH process matching.

The current host matcher misses valid SSH invocations like ssh user@openshell-my-sandbox, so active sessions can be undercounted and destructive warnings may be skipped.

Suggested fix
-  const sshHost = `openshell-${sandboxName}`;
-  // Match sshHost as a complete word — preceded by whitespace/start and followed
-  // by whitespace/end. This prevents `openshell-dev` from matching inside
-  // `openshell-dev-staging`.
-  const hostPattern = new RegExp(`(?:^|\\s)${escapeRegExp(sshHost)}(?:\\s|$)`);
+  const sshHost = `openshell-${sandboxName}`;
   const sessions: SandboxSession[] = [];
   const lines = pgrepOutput.split("\n").filter(Boolean);
@@
-    const cmdline = pidMatch[2];
-
-    if (hostPattern.test(cmdline)) {
+    const cmdline = pidMatch[2];
+    const argv = cmdline.trim().split(/\s+/);
+    const matchesHost = argv.some(
+      (token) => token === sshHost || token.endsWith(`@${sshHost}`),
+    );
+    if (matchesHost) {
       sessions.push({ sandboxName, pid, sshHost });
     }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/sandbox-session-state.ts` around lines 104 - 121, The hostPattern
used to detect SSH processes misses targets of the form
user@openshell-<sandbox>; update the regex construction around sshHost (where
hostPattern and escapeRegExp are created) to allow an optional user@ prefix (and
optionally SSH URI prefixes like ssh://) when testing cmdline, so that lines
like "ssh user@openshell-my-sandbox" are matched and sessions.push({
sandboxName, pid, sshHost }) runs for those processes; adjust the RegExp used in
hostPattern accordingly and keep the rest of the pid parsing (pidMatch, pid,
cmdline) intact.
🧹 Nitpick comments (1)
src/lib/sandbox-session-state.test.ts (1)

70-131: Add a regression test for ssh user@openshell-<sandbox> format.

Please add a parser case for user@host targets so session detection stays correct for common SSH syntax and future refactors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/sandbox-session-state.test.ts` around lines 70 - 131, The tests lack
a case for SSH targets using the user@host form (e.g. "ssh
user@openshell-<sandbox>"); add a unit test in sandbox-session-state.test.ts
calling parseSshProcesses with output containing a line like "PID ssh
user@openshell-my-sandbox" and asserting it returns a session with pid and
sshHost "openshell-my-sandbox". If parseSshProcesses’ current regex doesn’t
handle the "user@" prefix, update its matching logic (in the parseSshProcesses
function) to allow an optional "user@" before the host (e.g. match optional \S+@
and then the openshell-<sandbox> host or split on '@' to extract the host) so
the parser correctly detects user@host targets.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/sandbox-session-state.ts`:
- Around line 204-215: The comment about detecting active SSH sessions
references `pgrep -a ssh` but the implementation uses `ps -axo pid,command`;
update the doc comment in the `getSshProcesses`/session detection block (the
high-level function that "Detect active SSH sessions for a named sandbox") to
accurately describe that we parse `ps -axo pid,command` output (and the matching
logic against the sandbox SSH host) instead of `pgrep`, so future readers and
maintainers see the correct system command and parsing behavior.

---

Duplicate comments:
In `@src/lib/sandbox-session-state.ts`:
- Around line 104-121: The hostPattern used to detect SSH processes misses
targets of the form user@openshell-<sandbox>; update the regex construction
around sshHost (where hostPattern and escapeRegExp are created) to allow an
optional user@ prefix (and optionally SSH URI prefixes like ssh://) when testing
cmdline, so that lines like "ssh user@openshell-my-sandbox" are matched and
sessions.push({ sandboxName, pid, sshHost }) runs for those processes; adjust
the RegExp used in hostPattern accordingly and keep the rest of the pid parsing
(pidMatch, pid, cmdline) intact.

---

Nitpick comments:
In `@src/lib/sandbox-session-state.test.ts`:
- Around line 70-131: The tests lack a case for SSH targets using the user@host
form (e.g. "ssh user@openshell-<sandbox>"); add a unit test in
sandbox-session-state.test.ts calling parseSshProcesses with output containing a
line like "PID ssh user@openshell-my-sandbox" and asserting it returns a session
with pid and sshHost "openshell-my-sandbox". If parseSshProcesses’ current regex
doesn’t handle the "user@" prefix, update its matching logic (in the
parseSshProcesses function) to allow an optional "user@" before the host (e.g.
match optional \S+@ and then the openshell-<sandbox> host or split on '@' to
extract the host) so the parser correctly detects user@host targets.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 0c430f29-add2-4eb7-975b-c73ce22f5c2d

📥 Commits

Reviewing files that changed from the base of the PR and between e8cc275 and e8978ce.

📒 Files selected for processing (3)
  • src/lib/sandbox-session-state.test.ts
  • src/lib/sandbox-session-state.ts
  • src/nemoclaw.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/nemoclaw.ts

Comment on lines +204 to +215
/** Run `pgrep -a ssh` and return stdout. Null if unavailable. */
getSshProcesses: () => string | null;
}

/**
* Detect active SSH sessions for a named sandbox.
*
* This is the high-level entry point used by consumers (destroy, rebuild, etc.).
* It invokes system commands through the deps interface for testability.
*
* Detection relies on `pgrep -a ssh` to find SSH processes targeting the
* sandbox's SSH host. The `getForwardList` dep is not used here (forward
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update stale probe documentation (pgrep vs ps).

Comments still say pgrep -a ssh, but the implementation uses ps -axo pid,command. This mismatch can mislead future maintenance/debugging.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/sandbox-session-state.ts` around lines 204 - 215, The comment about
detecting active SSH sessions references `pgrep -a ssh` but the implementation
uses `ps -axo pid,command`; update the doc comment in the
`getSshProcesses`/session detection block (the high-level function that "Detect
active SSH sessions for a named sandbox") to accurately describe that we parse
`ps -axo pid,command` output (and the matching logic against the sandbox SSH
host) instead of `pgrep`, so future readers and maintainers see the correct
system command and parsing behavior.

@wscurran wscurran added enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). OpenShell Support for OpenShell, a safe, private runtime for autonomous AI agents labels Apr 20, 2026
@ericksoa ericksoa merged commit 7732f5b into main Apr 21, 2026
11 checks passed
ericksoa pushed a commit that referenced this pull request Apr 21, 2026
## Summary

Catches up the user-facing reference and troubleshooting docs with the
CLI and policy behavior changes that landed in v0.0.21. Drafted via the
`nemoclaw-contributor-update-docs` skill against commits in
`v0.0.20..v0.0.21`, filtered through `docs/.docs-skip`.

## Changes

- **`docs/reference/commands.md`**
- `nemoclaw list`: session indicator (●) for connected sandboxes
(#2117).
- `nemoclaw <name> connect`: active-session note; auto-recovery from SSH
identity drift after a host reboot (#2117, #2064).
- `nemoclaw <name> status`: three-state Inference line (`healthy` /
`unreachable` / `not probed`) covering both local and remote providers;
new `Connected` line (#2002, #2117).
- `nemoclaw <name> destroy` and `rebuild`: active-session warning with
second confirm; rebuild reapplies policy presets to the recreated
sandbox (#2117, #2026).
- `nemoclaw <name> policy-add` and `policy-remove`: positional preset
argument and non-interactive flow via
`--yes`/`--force`/`NEMOCLAW_NON_INTERACTIVE=1` (#2070).
- `nemoclaw <name> policy-list`: registry-vs-gateway desync detection
(#2089).
- **`docs/reference/troubleshooting.md`**
- `Reconnect after a host reboot`: now reflects automatic stale
`known_hosts` pruning on `connect` (#2064).
- `Running multiple sandboxes simultaneously`: onboard's forward-port
collision guard (#2086).
- New section: `openclaw config set` or `unset` is blocked inside the
sandbox (#2081).
- **`docs/network-policy/customize-network-policy.md`**: non-interactive
`policy-add`/`policy-remove` form; preset preservation across rebuild
(#2070, #2026).
- **`docs/inference/use-local-inference.md`**: NIM section now covers
the NGC API key prompt with masked input and `docker login nvcr.io
--password-stdin` behavior (#2043).
- **Generated skills regenerated** to pick up the source changes
(`.agents/skills/nemoclaw-user-reference/references/{commands,troubleshooting}.md`,
plus minor heading-flow deltas elsewhere). The pre-commit `Regenerate
agent skills from docs` hook ran and confirmed source ↔ generated
parity.

Commits skipped per `docs/.docs-skip` or no doc impact: `bbbaa0fb`
(skip-features), `7cb482cb` (skip-features), `8dee23fd` (skip-terms),
plus the usual CI / test / refactor / install-plumbing churn.

## Type of Change

- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [x] Doc only (includes code sample changes)

## Verification

- [x] `npx prek run --all-files` passes for the modified files (the one
failing test, `test/cli.test.ts > unknown command exits 1`, also fails
on `origin/main` and is unrelated to these markdown-only changes)
- [ ] `npm test` passes — skipped; same pre-existing CLI-dispatch test
failure unrelated to docs
- [ ] Tests added or updated for new or changed behavior — n/a, doc-only
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [ ] `make docs` builds without warnings (doc changes only) — not run
locally
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)
— n/a, no new pages

## AI Disclosure

- [x] AI-assisted — tool: Claude Code

---
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
  * Multi-session SSH connections with concurrent session support.
* Three-state inference health reporting (healthy/unreachable/not
probed) across all providers.
  * Automatic SSH host key rotation detection and recovery.
  * Non-interactive policy preset management via positional arguments.
  * Session indicators in sandbox list view.

* **Bug Fixes**
  * Protected destructive operations with active-session warnings.
  * Policy presets now preserved during sandbox rebuilds.

* **Documentation**
  * NGC registry authentication requirements for container images.
  * Multi-sandbox onboarding and reconnection guidance.
  * Troubleshooting updates for port conflicts and SSH issues.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). OpenShell Support for OpenShell, a safe, private runtime for autonomous AI agents

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MacOS] Gateway destroy kills sandbox session with Broken pipe instead of graceful error

4 participants