fix(claude): steer system prompt away from over-eager notebook creation#336
Conversation
The Claude system prompt biased the agent toward creating a notebook
in response to question-style prompts. The canonical repro from the
issue: attach package.json, ask "load this json file and summarize
for me", and the agent creates a notebook to hold the summary instead
of replying in chat.
The fix is in the system prompt itself (per the issue thread's
consensus — UI toggle and out-of-box skills were both considered and
rejected as more surprising than a prompt refinement). Three pieces
of steering work together:
- Soften the anchor concept: the prompt no longer opens with
"JupyterLab which is an IDE for Jupyter notebooks", which
tautologically primes notebook-thinking. Now it names notebooks,
file editors, and terminals as first-class surfaces.
- Add an explicit chat-default paragraph naming the question
patterns ("summarize this", "explain this", "debug this", "show
me what this does", etc.) and the create-case patterns ("create a
notebook that...", "write me a script to...", "save this as a
file", "show me a notebook that..."). Calls out the attach-a-file-
and-ask manifestation that plmbr#335 reported.
- Tighten the JUPYTER_UI_TOOLS_SYSTEM_PROMPT conditional: "If you
create a notebook or run it, save it after creating or running it"
was structurally permissive (read as standing permission to
create-and-save) and contained a vestigial run-save tail. Now:
"If the user has asked you to create a notebook, save it
afterward." Reinforces the chat-default rather than softly
contradicting it.
Refactored the prompt assembly out of `ClaudeCodeChatParticipant`
into a module-level `build_claude_system_prompt(jupyter_ui_tools_enabled,
jupyter_root_dir)` so the test suite can pin the load-bearing
phrases without instantiating the chat participant or touching
JupyterLab runtime state. The order of the chat-default paragraph
before the UI-tools block is load-bearing (recency bias) and pinned
by `test_chat_default_guidance_precedes_jupyter_ui_tools_block`.
Eight tests pin the contract: the imperative core, the
attached-file carveout, the explicit-request examples (four
variants), UI-tools inclusion / exclusion branches, the
chat-default surviving without UI tools, the chat-default appearing
before the UI-tools block, and root-dir interpolation.
Followups (intentionally deferred):
- Same notebook-creation bias may live in `prompts.CHAT_SYSTEM_PROMPT`
(used by the generic + GitHub Copilot participants). Worth an
empirical spot-check.
- Claude Agent SDK accepts both `system_prompt=str` (replace
CC's default) and `{"type": "preset", "preset": "claude_code",
"append": str}` (augment CC's default). NBI passes a string, so
Claude Code's default system prompt is replaced. Switching to the
preset/append form would let CC's own anti-over-eager guidance
reinforce ours. Pre-existing behavior, larger refactor.
Closes plmbr#335
|
Followed up with a Playwright check against the issue's canonical repro after I realized the unit tests only pin the prompt's text, not the model's downstream behavior. Setup: fresh build of this branch on a clean JL session; workspace contained the same scenario as the issue (a Repro: opened the NBI sidebar, attached Result: Claude read the file and replied in chat with a prose summary ("Project: sample-project (v1.2.3) ... It's a React 18 project written in TypeScript..."). No new notebook tab was created, and the main-area tab list stayed unchanged through 45 seconds of observation. So the fix actually behaves the way the docstring promises, not just contains the right words. |
Promotes the [Unreleased] CHANGELOG snapshot to [5.0.0] - 2026-05-22 and expands it to cover everything merged into upstream/main after PR plmbr#287's docs refresh. Bumps package.json to 5.0.0. CHANGELOG additions cover the post-plmbr#287 surface: - Settings tabs: plugin marketplace picker (plmbr#284), plugin marketplace details + Update button (plmbr#303), per-workspace MCP disable (plmbr#286), JSON-paste path in Add MCP server (plmbr#285). - Launchers: hide-with-policy (plmbr#288), brand icons for Codex / opencode (plmbr#325, plmbr#333), per-launch directory picker (plmbr#332). - Chat sidebar and agentic UX: workspace @-mention in Claude mode (plmbr#327), reload-open-files-on-disk (plmbr#330), steered system prompt away from over-eager notebook creation (plmbr#336). - Skills: multi-manifest support (plmbr#321), tracks-upstream for user- imported skills (plmbr#322), HTTP kill switch for the reconciler (plmbr#291). - Accessibility: full sub-section covering plmbr#305-plmbr#320. - Security: shell-tool sandbox (plmbr#290), Claude UI-bridge sandbox (plmbr#323), 0o600 on encrypted token (plmbr#293), env-secret scrubbing (plmbr#295), MCP config shape validation (plmbr#299), XSS allowlist (plmbr#296), Copilot WS auth + origin (plmbr#301), GHE host detection (plmbr#292), fastmcp -> mcp SDK swap (plmbr#324). - Fixed: session listing unification (plmbr#310), session preview unwrap (plmbr#331), down-area runtime throw (plmbr#330 follow-up), WS message-handler leak (plmbr#294). - Removed: fastmcp dependency, history.jsonl session gate. Adds a Migration note covering the five behavior changes operators should review before upgrading from 4.x: fastmcp swap, path sandboxes, history.jsonl gate removal, workspace @-mention pointer shape, and the Copilot WebSocket auth/origin tightening. Two reviewer rounds (six personas each) applied: - Round 1 caught security overclaims (plmbr#293, plmbr#299, plmbr#323), the plmbr#284/plmbr#303 mis-attribution, missing migration note, 3 em dashes, and the stale `fastmcp==2.x.*` recommendation in the admin guide. - Round 2 caught the missing plmbr#301 migration bullet, missing version- matrix 5.0.x row, missing README TOC entry, and a couple of style nits (sub-heading overpromise, orphan bullet). Skipped (deferred to future PRs): - README first-run tour mention. - Admin guide HTTP kill-switch row in Failure-modes table. - Terminal drag-drop trust-model precision update after plmbr#327. - Cipher description nit in plmbr#293 (Fernet AES-128-CBC+HMAC, not AES-GCM).
Summary
Closes #335. In Claude mode the agent was creating a notebook in response to question-style prompts. Canonical repro from the issue: attach
package.json, ask "load this json file and summarize for me", and the agent created a notebook to hold the summary instead of replying in chat. The fix refines the system prompt, per the issue thread's consensus that a UI toggle or out-of-box skills would be more surprising than a prompt edit.Solution
Three pieces of steering work together in
_create_system_prompt:"summarize this","explain this","what does this output mean","how would I do X","debug this","show me what this does") and the create-case patterns ("create a notebook that...","write me a script to...","save this as a file","show me a notebook that..."). Calls out the attach-a-file-and-ask manifestation that Chat UI prompts tend to create a new notebook #335 reported.JUPYTER_UI_TOOLS_SYSTEM_PROMPTconditional: was "If you create a notebook or run it, save it after creating or running it" (structurally permissive, vestigial run-save tail). Now: "If the user has asked you to create a notebook, save it afterward." Reinforces the chat-default rather than softly contradicting it.Refactored the prompt assembly out of
ClaudeCodeChatParticipant._create_system_promptinto a module-levelbuild_claude_system_prompt(jupyter_ui_tools_enabled, jupyter_root_dir)so the test suite can pin the load-bearing phrases without instantiating the chat participant or touching JupyterLab runtime state. Order is load-bearing (recency bias): the chat-default paragraph must appear before the UI-tools block; pinned by an explicit ordering test.Testing
Eight new tests in
tests/test_claude_system_prompt.py:"show me a notebook that..."which disambiguates the "show me" verb).jupyter_ui_tools_enabled=True, exclusion whenFalse.Substring assertions pin specific phrases so a future "tighten the prompt" rewrite is forced to update the test, surfacing the intent question.
Full suite:
pytest tests/ --ignore=tests/test_claude_client.py(1063 passed),jlpm tsc --noEmit(clean),jlpm lint:check(clean),jlpm jest(259 passed). No Playwright check ran — this is a Python-only prompt edit with no JL runtime touchpoints, and the unit tests pin the contract directly.Risks / follow-ups
notebook_intelligence/prompts.CHAT_SYSTEM_PROMPT(used by the generic + GitHub Copilot participants). Worth an empirical spot-check; port the chat-default paragraph if it repros.system_prompt=str(replaces Claude Code's default) and{"type": "preset", "preset": "claude_code", "append": str}(augments CC's default). NBI passes a string, so CC's own system prompt is replaced. Switching to the preset/append form would let CC's anti-over-eager guidance reinforce ours instead of getting clobbered. Pre-existing behavior, not introduced here; larger refactor.Issue linkage
Closes #335.