security: sandbox shell-tool working_directory to jupyter_root#290
Conversation
The two shell-execution tools (run_command_in_embedded_terminal and run_command_in_jupyter_terminal) accepted a working_directory argument from the LLM and passed it through unsanitized: - run_command_in_embedded_terminal forwarded it as subprocess.Popen(cwd=...) - run_command_in_jupyter_terminal forwarded it via the JupyterLab UI command that opens a real terminal An LLM-supplied path of '/etc', '../../..', or a workspace symlink to an outside directory could land a real shell outside the Jupyter workspace. The sibling execute_command tool already routed working_directory through _get_safe_path(); these two were the outliers. Both tools now run the same gate: resolve the path, fail if it escapes jupyter_root_dir, fail if it doesn't exist, fail if it isn't a directory. The sandboxed absolute path is forwarded to Popen / the UI command. Error strings match what execute_command already returns so the chat UX is consistent. Tests cover absolute-path escape, relative traversal, symlink escape, combined-prefix traversal (`valid-subdir/../../..`), null byte input, file-instead-of-directory, nonexistent path, and the positive cases for both tools. Each rejection assertion checks that Popen / the UI command spy was never called, so a true regression flunks the test.
The positive-case tests asserted popen_spy.call_count == 1. CI Python 3.12 counted 2 calls (an environment-dependent MagicMock detail of how the post-Popen path iterates process.stdout / process.stderr); locally on 3.14 the same flow counts 1. The security property the tests pin is 'Popen was called with the sandboxed cwd, not the raw input,' which is fully captured by call_args_list[0]; the count is incidental. Switched both positive-path assertions to popen_spy.called plus the first call_args. Verified the tests still catch a regression of the production fix (reverting built_in_toolsets.py reproduces 13 failures).
| async def run_command_in_jupyter_terminal(command: str, working_directory: str = ".", **args) -> str: | ||
| """Run a shell command in a Jupyter terminal within working_directory. This can be used to run long running processes like web applications. Returns the output of the command. | ||
|
|
There was a problem hiding this comment.
thanks @pjdoland . do we need this also for the same tool in claude.py https://github.com/notebook-intelligence/notebook-intelligence/blob/main/notebook_intelligence/claude.py#L1071
There was a problem hiding this comment.
Good catch, yes. The Claude-mode tool at claude.py:1071 has the same gap. It hands working_directory to the frontend notebook-intelligence:run-command-in-terminal command (src/index.ts:1185), which passes the value straight to terminal:create-new as cwd. Nothing on either side resolves it against jupyter_root_dir, so an LLM call with working_directory='/etc', a ../../.. traversal, or a workspace symlink pointing outside opens a terminal there and runs the command.
I'll send a follow-up PR after this one lands that:
- Lifts
_get_safe_pathout ofbuilt_in_toolsets.pyand intoutil.pynext toget_jupyter_root_dir(), so the Claude-mode@toolregistrations can share the same gate without an import cycle. - Applies the gate inside
run_command_in_jupyter_terminalinclaude.py. - Mirrors the pytest coverage from this PR for the Claude-mode path (absolute escape,
..traversal, symlink, null byte, nonexistent dir, file-not-dir, plus the positive cases).
While I'm in there I'll also take a look at open-file-in-jupyter-ui (claude.py:1089), which forwards file_path to docmanager:open. Lower-severity since it's a read path that goes through JupyterLab's contents service, but worth confirming the service rejects out-of-root paths before assuming it's safe.
Happy to bundle the audit findings into the same follow-up PR, or split them, whichever you prefer.
CI patches subprocess.Popen globally; on Linux/CI the bundled Claude
Agent SDK background thread happens to spawn its claude binary during
the patch window. The spy was counting those calls along with the shell
tool's own, so assert_not_called() saw count=2 (one SDK, zero from the
function under test).
Filter by the spy's call shape: the shell tool always passes
shlex.split('echo hi') which is a list with exactly ['echo', 'hi'];
the SDK passes a tuple of CLI args. _shell_tool_calls() returns only
calls matching the shell-tool shape, so the security assertions are
robust to concurrent unrelated Popen calls.
Verified regression detection by reverting built_in_toolsets.py and
re-running: all 13 tests fail.
PR #290 closed the working_directory traversal on the two builtin shell tools but left the two Claude-mode siblings open: - `run-command-in-jupyter-terminal` at claude.py:1071 forwards `working_directory` straight to the JupyterLab UI command `notebook-intelligence:run-command-in-terminal`, which passes it to `terminal:create-new` as `cwd`. An LLM tool call with `working_directory='/etc'`, `'../../..'`, or a workspace symlink pointing outside opens a terminal there and runs the command. - `open-file-in-jupyter-ui` at claude.py:1089 forwards `file_path` to `docmanager:open`. JupyterLab's contents service today rejects out-of-root absolute paths, but relying on that framework behavior leaves the same shape of bug latent if the contents service ever loosens; defense in depth applies the same gate here. Both tools now route the LLM-supplied path through `util.safe_jupyter_path`, the same gate the builtin tools use. The forwarded payload carries the resolved absolute path so any intermediate that does its own cwd-relative resolution can't double-resolve into a different target. Error wording matches the builtin siblings ("outside allowed directory", "does not exist", "not a directory") so the chat UX is consistent.
16 cases mirroring tests/test_builtin_toolset_cwd_sandbox.py from PR #290 but driving the Claude-mode tool handlers directly. run_command_in_jupyter_terminal: absolute escape, relative traversal, workspace symlink pointing outside, traversal via a valid-subdir prefix (`valid/../../..`, pins that resolve() collapses `..` before the containment check), embedded NUL, the two cwd-must-exist branches (nonexistent dir, file-not-dir), and positive cases for relative subdir, `.`, and `""` meaning the root. Every rejection asserts that the response spy's `run_ui_command` was never invoked, so a regression flunks the test rather than silently allowing the escape. open_file_in_jupyter_ui: absolute escape, relative traversal, workspace symlink, embedded NUL, and two positive cases — including a not-yet-existing file, since docmanager:open is also used for paths the user is about to create and the security gate deliberately doesn't require existence. Verified the new tests fail without the fix by reverting claude.py and re-running; the representative rejection cases flunk, then pass once the fix is reapplied.
Promotes the [Unreleased] CHANGELOG snapshot to [5.0.0] - 2026-05-22 and expands it to cover everything merged into upstream/main after PR plmbr#287's docs refresh. Bumps package.json to 5.0.0. CHANGELOG additions cover the post-plmbr#287 surface: - Settings tabs: plugin marketplace picker (plmbr#284), plugin marketplace details + Update button (plmbr#303), per-workspace MCP disable (plmbr#286), JSON-paste path in Add MCP server (plmbr#285). - Launchers: hide-with-policy (plmbr#288), brand icons for Codex / opencode (plmbr#325, plmbr#333), per-launch directory picker (plmbr#332). - Chat sidebar and agentic UX: workspace @-mention in Claude mode (plmbr#327), reload-open-files-on-disk (plmbr#330), steered system prompt away from over-eager notebook creation (plmbr#336). - Skills: multi-manifest support (plmbr#321), tracks-upstream for user- imported skills (plmbr#322), HTTP kill switch for the reconciler (plmbr#291). - Accessibility: full sub-section covering plmbr#305-plmbr#320. - Security: shell-tool sandbox (plmbr#290), Claude UI-bridge sandbox (plmbr#323), 0o600 on encrypted token (plmbr#293), env-secret scrubbing (plmbr#295), MCP config shape validation (plmbr#299), XSS allowlist (plmbr#296), Copilot WS auth + origin (plmbr#301), GHE host detection (plmbr#292), fastmcp -> mcp SDK swap (plmbr#324). - Fixed: session listing unification (plmbr#310), session preview unwrap (plmbr#331), down-area runtime throw (plmbr#330 follow-up), WS message-handler leak (plmbr#294). - Removed: fastmcp dependency, history.jsonl session gate. Adds a Migration note covering the five behavior changes operators should review before upgrading from 4.x: fastmcp swap, path sandboxes, history.jsonl gate removal, workspace @-mention pointer shape, and the Copilot WebSocket auth/origin tightening. Two reviewer rounds (six personas each) applied: - Round 1 caught security overclaims (plmbr#293, plmbr#299, plmbr#323), the plmbr#284/plmbr#303 mis-attribution, missing migration note, 3 em dashes, and the stale `fastmcp==2.x.*` recommendation in the admin guide. - Round 2 caught the missing plmbr#301 migration bullet, missing version- matrix 5.0.x row, missing README TOC entry, and a couple of style nits (sub-heading overpromise, orphan bullet). Skipped (deferred to future PRs): - README first-run tour mention. - Admin guide HTTP kill-switch row in Failure-modes table. - Terminal drag-drop trust-model precision update after plmbr#327. - Cipher description nit in plmbr#293 (Fernet AES-128-CBC+HMAC, not AES-GCM).
Summary
The two shell-execution tools surfaced to the LLM agent (`run_command_in_embedded_terminal` and `run_command_in_jupyter_terminal`) accepted a `working_directory` argument and passed it through unsanitized. An LLM tool call with `working_directory='/etc'`, `'../../..'`, or a workspace symlink to an outside directory would land a real shell outside the Jupyter workspace. The sibling `execute_command` tool already routed `working_directory` through the existing `_get_safe_path()` gate; these two were the only outliers.
This PR closes that gap.
Solution
Both tools now apply the same gate:
Error-string wording and shape match the sibling for consistency in the chat UX. No public API changes; the tool schema docstrings already steered the LLM toward relative paths.
Testing
Thirteen new pytest cases in `tests/test_builtin_toolset_cwd_sandbox.py`. For each rejection, the test asserts that `subprocess.Popen` (or the UI-command spy) was never called, so a true regression flunks the test rather than silently allowing the escape.
Coverage:
Suite-wide checks pass: `pytest tests/` (734 passing, +13 new), `jlpm tsc --noEmit`, `jlpm lint:check`, `jlpm jest` (154 passing).
Verified the new tests fail without the fix by reverting `built_in_toolsets.py` and re-running; all six relevant rejection tests flunk as expected, then pass once the fix is reapplied.
Risks / follow-ups