Skip to content

security: sandbox shell-tool working_directory to jupyter_root#290

Merged
mbektas merged 3 commits into
plmbr:mainfrom
pjdoland:fix/sandbox-shell-tool-cwd
May 18, 2026
Merged

security: sandbox shell-tool working_directory to jupyter_root#290
mbektas merged 3 commits into
plmbr:mainfrom
pjdoland:fix/sandbox-shell-tool-cwd

Conversation

@pjdoland
Copy link
Copy Markdown
Collaborator

Summary

The two shell-execution tools surfaced to the LLM agent (`run_command_in_embedded_terminal` and `run_command_in_jupyter_terminal`) accepted a `working_directory` argument and passed it through unsanitized. An LLM tool call with `working_directory='/etc'`, `'../../..'`, or a workspace symlink to an outside directory would land a real shell outside the Jupyter workspace. The sibling `execute_command` tool already routed `working_directory` through the existing `_get_safe_path()` gate; these two were the only outliers.

This PR closes that gap.

Solution

Both tools now apply the same gate:

  1. `work_dir = _get_safe_path(working_directory)` resolves the path and raises if it escapes `jupyter_root_dir`.
  2. Reject with the same error string `execute_command` already emits when the directory doesn't exist or isn't a directory.
  3. Forward the sandboxed absolute path to `subprocess.Popen(cwd=...)` or to the JupyterLab UI command that opens a terminal.

Error-string wording and shape match the sibling for consistency in the chat UX. No public API changes; the tool schema docstrings already steered the LLM toward relative paths.

Testing

Thirteen new pytest cases in `tests/test_builtin_toolset_cwd_sandbox.py`. For each rejection, the test asserts that `subprocess.Popen` (or the UI-command spy) was never called, so a true regression flunks the test rather than silently allowing the escape.

Coverage:

  • Absolute path outside the workspace
  • Relative `../../..` traversal
  • Symlink inside the workspace pointing outside (the most realistic bypass, since `Path.resolve()` chases symlinks)
  • Combined-prefix traversal (`valid-subdir/../../..`) to pin that `resolve()` collapses `..` before the containment check
  • Null byte in path
  • Nonexistent directory
  • File instead of directory
  • Positive case: relative subdirectory
  • Positive case: `.` meaning the root
  • Same set repeated for the Jupyter-terminal sibling

Suite-wide checks pass: `pytest tests/` (734 passing, +13 new), `jlpm tsc --noEmit`, `jlpm lint:check`, `jlpm jest` (154 passing).

Verified the new tests fail without the fix by reverting `built_in_toolsets.py` and re-running; all six relevant rejection tests flunk as expected, then pass once the fix is reapplied.

Risks / follow-ups

  • Behavior change: a user workflow that legitimately relied on an absolute path outside the workspace will now see a tool error and the LLM will re-issue with a relative path. The tool description in the schema ("relative to jupyter_root_dir, default is root") already advertised the relative-path contract, so this aligns runtime behavior with the documented contract.
  • Symlinks: workspace-internal symlinks pointing outside (e.g. a user-created shortcut to `~/code`) are also rejected. This matches the existing behavior of `execute_command` so the two tools stay consistent.
  • The `_get_safe_path` error message echoes the resolved `jupyter_root_dir` absolute path back to the LLM. This is consistent with the sibling and is acceptable for a per-user tool; not worth altering here.

The two shell-execution tools (run_command_in_embedded_terminal and
run_command_in_jupyter_terminal) accepted a working_directory argument
from the LLM and passed it through unsanitized:

- run_command_in_embedded_terminal forwarded it as subprocess.Popen(cwd=...)
- run_command_in_jupyter_terminal forwarded it via the JupyterLab UI
  command that opens a real terminal

An LLM-supplied path of '/etc', '../../..', or a workspace symlink to an
outside directory could land a real shell outside the Jupyter workspace.
The sibling execute_command tool already routed working_directory through
_get_safe_path(); these two were the outliers.

Both tools now run the same gate: resolve the path, fail if it escapes
jupyter_root_dir, fail if it doesn't exist, fail if it isn't a directory.
The sandboxed absolute path is forwarded to Popen / the UI command. Error
strings match what execute_command already returns so the chat UX is
consistent.

Tests cover absolute-path escape, relative traversal, symlink escape,
combined-prefix traversal (`valid-subdir/../../..`), null byte input,
file-instead-of-directory, nonexistent path, and the positive cases for
both tools. Each rejection assertion checks that Popen / the UI command
spy was never called, so a true regression flunks the test.
@pjdoland pjdoland added the bug Something isn't working label May 18, 2026
The positive-case tests asserted popen_spy.call_count == 1. CI Python 3.12
counted 2 calls (an environment-dependent MagicMock detail of how the
post-Popen path iterates process.stdout / process.stderr); locally on 3.14
the same flow counts 1. The security property the tests pin is 'Popen was
called with the sandboxed cwd, not the raw input,' which is fully captured
by call_args_list[0]; the count is incidental.

Switched both positive-path assertions to popen_spy.called plus the first
call_args. Verified the tests still catch a regression of the production
fix (reverting built_in_toolsets.py reproduces 13 failures).
async def run_command_in_jupyter_terminal(command: str, working_directory: str = ".", **args) -> str:
"""Run a shell command in a Jupyter terminal within working_directory. This can be used to run long running processes like web applications. Returns the output of the command.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, yes. The Claude-mode tool at claude.py:1071 has the same gap. It hands working_directory to the frontend notebook-intelligence:run-command-in-terminal command (src/index.ts:1185), which passes the value straight to terminal:create-new as cwd. Nothing on either side resolves it against jupyter_root_dir, so an LLM call with working_directory='/etc', a ../../.. traversal, or a workspace symlink pointing outside opens a terminal there and runs the command.

I'll send a follow-up PR after this one lands that:

  1. Lifts _get_safe_path out of built_in_toolsets.py and into util.py next to get_jupyter_root_dir(), so the Claude-mode @tool registrations can share the same gate without an import cycle.
  2. Applies the gate inside run_command_in_jupyter_terminal in claude.py.
  3. Mirrors the pytest coverage from this PR for the Claude-mode path (absolute escape, .. traversal, symlink, null byte, nonexistent dir, file-not-dir, plus the positive cases).

While I'm in there I'll also take a look at open-file-in-jupyter-ui (claude.py:1089), which forwards file_path to docmanager:open. Lower-severity since it's a read path that goes through JupyterLab's contents service, but worth confirming the service rejects out-of-root paths before assuming it's safe.

Happy to bundle the audit findings into the same follow-up PR, or split them, whichever you prefer.

CI patches subprocess.Popen globally; on Linux/CI the bundled Claude
Agent SDK background thread happens to spawn its claude binary during
the patch window. The spy was counting those calls along with the shell
tool's own, so assert_not_called() saw count=2 (one SDK, zero from the
function under test).

Filter by the spy's call shape: the shell tool always passes
shlex.split('echo hi') which is a list with exactly ['echo', 'hi'];
the SDK passes a tuple of CLI args. _shell_tool_calls() returns only
calls matching the shell-tool shape, so the security assertions are
robust to concurrent unrelated Popen calls.

Verified regression detection by reverting built_in_toolsets.py and
re-running: all 13 tests fail.
@mbektas mbektas merged commit c516175 into plmbr:main May 18, 2026
4 checks passed
mbektas pushed a commit that referenced this pull request May 19, 2026
PR #290 closed the working_directory traversal on the two builtin
shell tools but left the two Claude-mode siblings open:

- `run-command-in-jupyter-terminal` at claude.py:1071 forwards
  `working_directory` straight to the JupyterLab UI command
  `notebook-intelligence:run-command-in-terminal`, which passes it
  to `terminal:create-new` as `cwd`. An LLM tool call with
  `working_directory='/etc'`, `'../../..'`, or a workspace symlink
  pointing outside opens a terminal there and runs the command.

- `open-file-in-jupyter-ui` at claude.py:1089 forwards `file_path`
  to `docmanager:open`. JupyterLab's contents service today rejects
  out-of-root absolute paths, but relying on that framework
  behavior leaves the same shape of bug latent if the contents
  service ever loosens; defense in depth applies the same gate
  here.

Both tools now route the LLM-supplied path through
`util.safe_jupyter_path`, the same gate the builtin tools use.
The forwarded payload carries the resolved absolute path so any
intermediate that does its own cwd-relative resolution can't
double-resolve into a different target. Error wording matches the
builtin siblings ("outside allowed directory", "does not exist",
"not a directory") so the chat UX is consistent.
mbektas pushed a commit that referenced this pull request May 19, 2026
16 cases mirroring tests/test_builtin_toolset_cwd_sandbox.py from
PR #290 but driving the Claude-mode tool handlers directly.

run_command_in_jupyter_terminal: absolute escape, relative
traversal, workspace symlink pointing outside, traversal via a
valid-subdir prefix (`valid/../../..`, pins that resolve()
collapses `..` before the containment check), embedded NUL, the
two cwd-must-exist branches (nonexistent dir, file-not-dir), and
positive cases for relative subdir, `.`, and `""` meaning the
root. Every rejection asserts that the response spy's
`run_ui_command` was never invoked, so a regression flunks the
test rather than silently allowing the escape.

open_file_in_jupyter_ui: absolute escape, relative traversal,
workspace symlink, embedded NUL, and two positive cases —
including a not-yet-existing file, since docmanager:open is also
used for paths the user is about to create and the security gate
deliberately doesn't require existence.

Verified the new tests fail without the fix by reverting
claude.py and re-running; the representative rejection cases
flunk, then pass once the fix is reapplied.
pjdoland added a commit to pjdoland/notebook-intelligence that referenced this pull request May 22, 2026
Promotes the [Unreleased] CHANGELOG snapshot to [5.0.0] - 2026-05-22
and expands it to cover everything merged into upstream/main after
PR plmbr#287's docs refresh. Bumps package.json to 5.0.0.

CHANGELOG additions cover the post-plmbr#287 surface:

- Settings tabs: plugin marketplace picker (plmbr#284), plugin marketplace
  details + Update button (plmbr#303), per-workspace MCP disable (plmbr#286),
  JSON-paste path in Add MCP server (plmbr#285).
- Launchers: hide-with-policy (plmbr#288), brand icons for Codex / opencode
  (plmbr#325, plmbr#333), per-launch directory picker (plmbr#332).
- Chat sidebar and agentic UX: workspace @-mention in Claude mode
  (plmbr#327), reload-open-files-on-disk (plmbr#330), steered system prompt
  away from over-eager notebook creation (plmbr#336).
- Skills: multi-manifest support (plmbr#321), tracks-upstream for user-
  imported skills (plmbr#322), HTTP kill switch for the reconciler (plmbr#291).
- Accessibility: full sub-section covering plmbr#305-plmbr#320.
- Security: shell-tool sandbox (plmbr#290), Claude UI-bridge sandbox (plmbr#323),
  0o600 on encrypted token (plmbr#293), env-secret scrubbing (plmbr#295), MCP
  config shape validation (plmbr#299), XSS allowlist (plmbr#296), Copilot WS
  auth + origin (plmbr#301), GHE host detection (plmbr#292), fastmcp -> mcp SDK
  swap (plmbr#324).
- Fixed: session listing unification (plmbr#310), session preview unwrap
  (plmbr#331), down-area runtime throw (plmbr#330 follow-up), WS message-handler
  leak (plmbr#294).
- Removed: fastmcp dependency, history.jsonl session gate.

Adds a Migration note covering the five behavior changes operators
should review before upgrading from 4.x: fastmcp swap, path
sandboxes, history.jsonl gate removal, workspace @-mention pointer
shape, and the Copilot WebSocket auth/origin tightening.

Two reviewer rounds (six personas each) applied:
- Round 1 caught security overclaims (plmbr#293, plmbr#299, plmbr#323), the
  plmbr#284/plmbr#303 mis-attribution, missing migration note, 3 em dashes,
  and the stale `fastmcp==2.x.*` recommendation in the admin guide.
- Round 2 caught the missing plmbr#301 migration bullet, missing version-
  matrix 5.0.x row, missing README TOC entry, and a couple of style
  nits (sub-heading overpromise, orphan bullet).

Skipped (deferred to future PRs):
- README first-run tour mention.
- Admin guide HTTP kill-switch row in Failure-modes table.
- Terminal drag-drop trust-model precision update after plmbr#327.
- Cipher description nit in plmbr#293 (Fernet AES-128-CBC+HMAC, not
  AES-GCM).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants