security: sandbox shell-tool working_directory to jupyter_root by pjdoland · Pull Request #290 · plmbr/notebook-intelligence

pjdoland · 2026-05-18T01:25:26Z

Summary

The two shell-execution tools surfaced to the LLM agent (`run_command_in_embedded_terminal` and `run_command_in_jupyter_terminal`) accepted a `working_directory` argument and passed it through unsanitized. An LLM tool call with `working_directory='/etc'`, `'../../..'`, or a workspace symlink to an outside directory would land a real shell outside the Jupyter workspace. The sibling `execute_command` tool already routed `working_directory` through the existing `_get_safe_path()` gate; these two were the only outliers.

This PR closes that gap.

Solution

Both tools now apply the same gate:

`work_dir = _get_safe_path(working_directory)` resolves the path and raises if it escapes `jupyter_root_dir`.
Reject with the same error string `execute_command` already emits when the directory doesn't exist or isn't a directory.
Forward the sandboxed absolute path to `subprocess.Popen(cwd=...)` or to the JupyterLab UI command that opens a terminal.

Error-string wording and shape match the sibling for consistency in the chat UX. No public API changes; the tool schema docstrings already steered the LLM toward relative paths.

Testing

Thirteen new pytest cases in `tests/test_builtin_toolset_cwd_sandbox.py`. For each rejection, the test asserts that `subprocess.Popen` (or the UI-command spy) was never called, so a true regression flunks the test rather than silently allowing the escape.

Coverage:

Absolute path outside the workspace
Relative `../../..` traversal
Symlink inside the workspace pointing outside (the most realistic bypass, since `Path.resolve()` chases symlinks)
Combined-prefix traversal (`valid-subdir/../../..`) to pin that `resolve()` collapses `..` before the containment check
Null byte in path
Nonexistent directory
File instead of directory
Positive case: relative subdirectory
Positive case: `.` meaning the root
Same set repeated for the Jupyter-terminal sibling

Suite-wide checks pass: `pytest tests/` (734 passing, +13 new), `jlpm tsc --noEmit`, `jlpm lint:check`, `jlpm jest` (154 passing).

Verified the new tests fail without the fix by reverting `built_in_toolsets.py` and re-running; all six relevant rejection tests flunk as expected, then pass once the fix is reapplied.

Risks / follow-ups

Behavior change: a user workflow that legitimately relied on an absolute path outside the workspace will now see a tool error and the LLM will re-issue with a relative path. The tool description in the schema ("relative to jupyter_root_dir, default is root") already advertised the relative-path contract, so this aligns runtime behavior with the documented contract.
Symlinks: workspace-internal symlinks pointing outside (e.g. a user-created shortcut to `~/code`) are also rejected. This matches the existing behavior of `execute_command` so the two tools stay consistent.
The `_get_safe_path` error message echoes the resolved `jupyter_root_dir` absolute path back to the LLM. This is consistent with the sibling and is acceptable for a per-user tool; not worth altering here.

The two shell-execution tools (run_command_in_embedded_terminal and run_command_in_jupyter_terminal) accepted a working_directory argument from the LLM and passed it through unsanitized: - run_command_in_embedded_terminal forwarded it as subprocess.Popen(cwd=...) - run_command_in_jupyter_terminal forwarded it via the JupyterLab UI command that opens a real terminal An LLM-supplied path of '/etc', '../../..', or a workspace symlink to an outside directory could land a real shell outside the Jupyter workspace. The sibling execute_command tool already routed working_directory through _get_safe_path(); these two were the outliers. Both tools now run the same gate: resolve the path, fail if it escapes jupyter_root_dir, fail if it doesn't exist, fail if it isn't a directory. The sandboxed absolute path is forwarded to Popen / the UI command. Error strings match what execute_command already returns so the chat UX is consistent. Tests cover absolute-path escape, relative traversal, symlink escape, combined-prefix traversal (`valid-subdir/../../..`), null byte input, file-instead-of-directory, nonexistent path, and the positive cases for both tools. Each rejection assertion checks that Popen / the UI command spy was never called, so a true regression flunks the test.

The positive-case tests asserted popen_spy.call_count == 1. CI Python 3.12 counted 2 calls (an environment-dependent MagicMock detail of how the post-Popen path iterates process.stdout / process.stderr); locally on 3.14 the same flow counts 1. The security property the tests pin is 'Popen was called with the sandboxed cwd, not the raw input,' which is fully captured by call_args_list[0]; the count is incidental. Switched both positive-path assertions to popen_spy.called plus the first call_args. Verified the tests still catch a regression of the production fix (reverting built_in_toolsets.py reproduces 13 failures).

mbektas · 2026-05-18T01:37:04Z

 async def run_command_in_jupyter_terminal(command: str, working_directory: str = ".", **args) -> str:
    """Run a shell command in a Jupyter terminal within working_directory. This can be used to run long running processes like web applications. Returns the output of the command.
-    
+


thanks @pjdoland . do we need this also for the same tool in claude.py https://github.com/notebook-intelligence/notebook-intelligence/blob/main/notebook_intelligence/claude.py#L1071

Good catch, yes. The Claude-mode tool at claude.py:1071 has the same gap. It hands working_directory to the frontend notebook-intelligence:run-command-in-terminal command (src/index.ts:1185), which passes the value straight to terminal:create-new as cwd. Nothing on either side resolves it against jupyter_root_dir, so an LLM call with working_directory='/etc', a ../../.. traversal, or a workspace symlink pointing outside opens a terminal there and runs the command.

I'll send a follow-up PR after this one lands that:

Lifts _get_safe_path out of built_in_toolsets.py and into util.py next to get_jupyter_root_dir(), so the Claude-mode @tool registrations can share the same gate without an import cycle.

Applies the gate inside run_command_in_jupyter_terminal in claude.py.

Mirrors the pytest coverage from this PR for the Claude-mode path (absolute escape, .. traversal, symlink, null byte, nonexistent dir, file-not-dir, plus the positive cases).

While I'm in there I'll also take a look at open-file-in-jupyter-ui (claude.py:1089), which forwards file_path to docmanager:open. Lower-severity since it's a read path that goes through JupyterLab's contents service, but worth confirming the service rejects out-of-root paths before assuming it's safe.

Happy to bundle the audit findings into the same follow-up PR, or split them, whichever you prefer.

CI patches subprocess.Popen globally; on Linux/CI the bundled Claude Agent SDK background thread happens to spawn its claude binary during the patch window. The spy was counting those calls along with the shell tool's own, so assert_not_called() saw count=2 (one SDK, zero from the function under test). Filter by the spy's call shape: the shell tool always passes shlex.split('echo hi') which is a list with exactly ['echo', 'hi']; the SDK passes a tuple of CLI args. _shell_tool_calls() returns only calls matching the shell-tool shape, so the security assertions are robust to concurrent unrelated Popen calls. Verified regression detection by reverting built_in_toolsets.py and re-running: all 13 tests fail.

PR #290 closed the working_directory traversal on the two builtin shell tools but left the two Claude-mode siblings open: - `run-command-in-jupyter-terminal` at claude.py:1071 forwards `working_directory` straight to the JupyterLab UI command `notebook-intelligence:run-command-in-terminal`, which passes it to `terminal:create-new` as `cwd`. An LLM tool call with `working_directory='/etc'`, `'../../..'`, or a workspace symlink pointing outside opens a terminal there and runs the command. - `open-file-in-jupyter-ui` at claude.py:1089 forwards `file_path` to `docmanager:open`. JupyterLab's contents service today rejects out-of-root absolute paths, but relying on that framework behavior leaves the same shape of bug latent if the contents service ever loosens; defense in depth applies the same gate here. Both tools now route the LLM-supplied path through `util.safe_jupyter_path`, the same gate the builtin tools use. The forwarded payload carries the resolved absolute path so any intermediate that does its own cwd-relative resolution can't double-resolve into a different target. Error wording matches the builtin siblings ("outside allowed directory", "does not exist", "not a directory") so the chat UX is consistent.

16 cases mirroring tests/test_builtin_toolset_cwd_sandbox.py from PR #290 but driving the Claude-mode tool handlers directly. run_command_in_jupyter_terminal: absolute escape, relative traversal, workspace symlink pointing outside, traversal via a valid-subdir prefix (`valid/../../..`, pins that resolve() collapses `..` before the containment check), embedded NUL, the two cwd-must-exist branches (nonexistent dir, file-not-dir), and positive cases for relative subdir, `.`, and `""` meaning the root. Every rejection asserts that the response spy's `run_ui_command` was never invoked, so a regression flunks the test rather than silently allowing the escape. open_file_in_jupyter_ui: absolute escape, relative traversal, workspace symlink, embedded NUL, and two positive cases — including a not-yet-existing file, since docmanager:open is also used for paths the user is about to create and the security gate deliberately doesn't require existence. Verified the new tests fail without the fix by reverting claude.py and re-running; the representative rejection cases flunk, then pass once the fix is reapplied.

Promotes the [Unreleased] CHANGELOG snapshot to [5.0.0] - 2026-05-22 and expands it to cover everything merged into upstream/main after PR plmbr#287's docs refresh. Bumps package.json to 5.0.0. CHANGELOG additions cover the post-plmbr#287 surface: - Settings tabs: plugin marketplace picker (plmbr#284), plugin marketplace details + Update button (plmbr#303), per-workspace MCP disable (plmbr#286), JSON-paste path in Add MCP server (plmbr#285). - Launchers: hide-with-policy (plmbr#288), brand icons for Codex / opencode (plmbr#325, plmbr#333), per-launch directory picker (plmbr#332). - Chat sidebar and agentic UX: workspace @-mention in Claude mode (plmbr#327), reload-open-files-on-disk (plmbr#330), steered system prompt away from over-eager notebook creation (plmbr#336). - Skills: multi-manifest support (plmbr#321), tracks-upstream for user- imported skills (plmbr#322), HTTP kill switch for the reconciler (plmbr#291). - Accessibility: full sub-section covering plmbr#305-plmbr#320. - Security: shell-tool sandbox (plmbr#290), Claude UI-bridge sandbox (plmbr#323), 0o600 on encrypted token (plmbr#293), env-secret scrubbing (plmbr#295), MCP config shape validation (plmbr#299), XSS allowlist (plmbr#296), Copilot WS auth + origin (plmbr#301), GHE host detection (plmbr#292), fastmcp -> mcp SDK swap (plmbr#324). - Fixed: session listing unification (plmbr#310), session preview unwrap (plmbr#331), down-area runtime throw (plmbr#330 follow-up), WS message-handler leak (plmbr#294). - Removed: fastmcp dependency, history.jsonl session gate. Adds a Migration note covering the five behavior changes operators should review before upgrading from 4.x: fastmcp swap, path sandboxes, history.jsonl gate removal, workspace @-mention pointer shape, and the Copilot WebSocket auth/origin tightening. Two reviewer rounds (six personas each) applied: - Round 1 caught security overclaims (plmbr#293, plmbr#299, plmbr#323), the plmbr#284/plmbr#303 mis-attribution, missing migration note, 3 em dashes, and the stale `fastmcp==2.x.*` recommendation in the admin guide. - Round 2 caught the missing plmbr#301 migration bullet, missing version- matrix 5.0.x row, missing README TOC entry, and a couple of style nits (sub-heading overpromise, orphan bullet). Skipped (deferred to future PRs): - README first-run tour mention. - Admin guide HTTP kill-switch row in Failure-modes table. - Terminal drag-drop trust-model precision update after plmbr#327. - Cipher description nit in plmbr#293 (Fernet AES-128-CBC+HMAC, not AES-GCM).

pjdoland added the bug Something isn't working label May 18, 2026

mbektas reviewed May 18, 2026

View reviewed changes

mbektas merged commit c516175 into plmbr:main May 18, 2026
4 checks passed

pjdoland mentioned this pull request May 19, 2026

security(claude): sandbox UI-bridge tool paths to jupyter_root #323

Merged

pjdoland mentioned this pull request May 22, 2026

chore(release): prep 5.0.0 docs and version bump #338

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

security: sandbox shell-tool working_directory to jupyter_root#290

security: sandbox shell-tool working_directory to jupyter_root#290
mbektas merged 3 commits into
plmbr:mainfrom
pjdoland:fix/sandbox-shell-tool-cwd

pjdoland commented May 18, 2026

Uh oh!

mbektas May 18, 2026

Uh oh!

pjdoland May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		async def run_command_in_jupyter_terminal(command: str, working_directory: str = ".", **args) -> str:
		"""Run a shell command in a Jupyter terminal within working_directory. This can be used to run long running processes like web applications. Returns the output of the command.

Conversation

pjdoland commented May 18, 2026

Summary

Solution

Testing

Risks / follow-ups

Uh oh!

mbektas May 18, 2026

Choose a reason for hiding this comment

Uh oh!

pjdoland May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants