fix(claude): steer system prompt away from over-eager notebook creation by pjdoland · Pull Request #336 · plmbr/notebook-intelligence

pjdoland · 2026-05-22T01:16:02Z

Summary

Closes #335. In Claude mode the agent was creating a notebook in response to question-style prompts. Canonical repro from the issue: attach package.json, ask "load this json file and summarize for me", and the agent created a notebook to hold the summary instead of replying in chat. The fix refines the system prompt, per the issue thread's consensus that a UI toggle or out-of-box skills would be more surprising than a prompt edit.

Solution

Three pieces of steering work together in _create_system_prompt:

Soften the anchor concept: the prompt no longer opens with "JupyterLab which is an IDE for Jupyter notebooks" (tautologically primes notebook-thinking). It now names notebooks, plain file editors, and terminals as first-class surfaces.
Add an explicit chat-default paragraph naming the common question patterns ("summarize this", "explain this", "what does this output mean", "how would I do X", "debug this", "show me what this does") and the create-case patterns ("create a notebook that...", "write me a script to...", "save this as a file", "show me a notebook that..."). Calls out the attach-a-file-and-ask manifestation that Chat UI prompts tend to create a new notebook #335 reported.
Tighten the JUPYTER_UI_TOOLS_SYSTEM_PROMPT conditional: was "If you create a notebook or run it, save it after creating or running it" (structurally permissive, vestigial run-save tail). Now: "If the user has asked you to create a notebook, save it afterward." Reinforces the chat-default rather than softly contradicting it.

Refactored the prompt assembly out of ClaudeCodeChatParticipant._create_system_prompt into a module-level build_claude_system_prompt(jupyter_ui_tools_enabled, jupyter_root_dir) so the test suite can pin the load-bearing phrases without instantiating the chat participant or touching JupyterLab runtime state. Order is load-bearing (recency bias): the chat-default paragraph must appear before the UI-tools block; pinned by an explicit ordering test.

Testing

Eight new tests in tests/test_claude_system_prompt.py:

The imperative core: "Default to answering questions directly in your chat reply".
The attached-file carveout: "When the user attaches a file..." / "do not produce a new notebook to hold the answer".
The explicit-request examples (four variants, including "show me a notebook that..." which disambiguates the "show me" verb).
UI-tools inclusion when jupyter_ui_tools_enabled=True, exclusion when False.
The chat-default guidance survives without UI tools (the bias is independent of the NBI MCP toolset).
The chat-default paragraph appears BEFORE the UI-tools block (defends against future edits that reorder sections).
Root-dir interpolation.

Substring assertions pin specific phrases so a future "tighten the prompt" rewrite is forced to update the test, surfacing the intent question.

Full suite: pytest tests/ --ignore=tests/test_claude_client.py (1063 passed), jlpm tsc --noEmit (clean), jlpm lint:check (clean), jlpm jest (259 passed). No Playwright check ran — this is a Python-only prompt edit with no JL runtime touchpoints, and the unit tests pin the contract directly.

Risks / follow-ups

Same bias may live in notebook_intelligence/prompts.CHAT_SYSTEM_PROMPT (used by the generic + GitHub Copilot participants). Worth an empirical spot-check; port the chat-default paragraph if it repros.
Claude Agent SDK accepts both system_prompt=str (replaces Claude Code's default) and {"type": "preset", "preset": "claude_code", "append": str} (augments CC's default). NBI passes a string, so CC's own system prompt is replaced. Switching to the preset/append form would let CC's anti-over-eager guidance reinforce ours instead of getting clobbered. Pre-existing behavior, not introduced here; larger refactor.
Prompt brittleness: the substring assertions will fail on cosmetic rewrites. That's intentional (forces a review of whether the intent is preserved) but worth noting for the next maintainer who touches the prompt.

Issue linkage

Closes #335.

The Claude system prompt biased the agent toward creating a notebook in response to question-style prompts. The canonical repro from the issue: attach package.json, ask "load this json file and summarize for me", and the agent creates a notebook to hold the summary instead of replying in chat. The fix is in the system prompt itself (per the issue thread's consensus — UI toggle and out-of-box skills were both considered and rejected as more surprising than a prompt refinement). Three pieces of steering work together: - Soften the anchor concept: the prompt no longer opens with "JupyterLab which is an IDE for Jupyter notebooks", which tautologically primes notebook-thinking. Now it names notebooks, file editors, and terminals as first-class surfaces. - Add an explicit chat-default paragraph naming the question patterns ("summarize this", "explain this", "debug this", "show me what this does", etc.) and the create-case patterns ("create a notebook that...", "write me a script to...", "save this as a file", "show me a notebook that..."). Calls out the attach-a-file- and-ask manifestation that plmbr#335 reported. - Tighten the JUPYTER_UI_TOOLS_SYSTEM_PROMPT conditional: "If you create a notebook or run it, save it after creating or running it" was structurally permissive (read as standing permission to create-and-save) and contained a vestigial run-save tail. Now: "If the user has asked you to create a notebook, save it afterward." Reinforces the chat-default rather than softly contradicting it. Refactored the prompt assembly out of `ClaudeCodeChatParticipant` into a module-level `build_claude_system_prompt(jupyter_ui_tools_enabled, jupyter_root_dir)` so the test suite can pin the load-bearing phrases without instantiating the chat participant or touching JupyterLab runtime state. The order of the chat-default paragraph before the UI-tools block is load-bearing (recency bias) and pinned by `test_chat_default_guidance_precedes_jupyter_ui_tools_block`. Eight tests pin the contract: the imperative core, the attached-file carveout, the explicit-request examples (four variants), UI-tools inclusion / exclusion branches, the chat-default surviving without UI tools, the chat-default appearing before the UI-tools block, and root-dir interpolation. Followups (intentionally deferred): - Same notebook-creation bias may live in `prompts.CHAT_SYSTEM_PROMPT` (used by the generic + GitHub Copilot participants). Worth an empirical spot-check. - Claude Agent SDK accepts both `system_prompt=str` (replace CC's default) and `{"type": "preset", "preset": "claude_code", "append": str}` (augment CC's default). NBI passes a string, so Claude Code's default system prompt is replaced. Switching to the preset/append form would let CC's own anti-over-eager guidance reinforce ours. Pre-existing behavior, larger refactor. Closes plmbr#335

pjdoland · 2026-05-22T01:29:52Z

Followed up with a Playwright check against the issue's canonical repro after I realized the unit tests only pin the prompt's text, not the model's downstream behavior.

Setup: fresh build of this branch on a clean JL session; workspace contained the same scenario as the issue (a package.json plus a notebook open in the editor).

Repro: opened the NBI sidebar, attached package.json via the workspace file picker, typed "load this json file and summarize for me", sent.

Result: Claude read the file and replied in chat with a prose summary ("Project: sample-project (v1.2.3) ... It's a React 18 project written in TypeScript..."). No new notebook tab was created, and the main-area tab list stayed unchanged through 45 seconds of observation.

So the fix actually behaves the way the docstring promises, not just contains the right words.

mbektas

LGTM, thanks!

Promotes the [Unreleased] CHANGELOG snapshot to [5.0.0] - 2026-05-22 and expands it to cover everything merged into upstream/main after PR plmbr#287's docs refresh. Bumps package.json to 5.0.0. CHANGELOG additions cover the post-plmbr#287 surface: - Settings tabs: plugin marketplace picker (plmbr#284), plugin marketplace details + Update button (plmbr#303), per-workspace MCP disable (plmbr#286), JSON-paste path in Add MCP server (plmbr#285). - Launchers: hide-with-policy (plmbr#288), brand icons for Codex / opencode (plmbr#325, plmbr#333), per-launch directory picker (plmbr#332). - Chat sidebar and agentic UX: workspace @-mention in Claude mode (plmbr#327), reload-open-files-on-disk (plmbr#330), steered system prompt away from over-eager notebook creation (plmbr#336). - Skills: multi-manifest support (plmbr#321), tracks-upstream for user- imported skills (plmbr#322), HTTP kill switch for the reconciler (plmbr#291). - Accessibility: full sub-section covering plmbr#305-plmbr#320. - Security: shell-tool sandbox (plmbr#290), Claude UI-bridge sandbox (plmbr#323), 0o600 on encrypted token (plmbr#293), env-secret scrubbing (plmbr#295), MCP config shape validation (plmbr#299), XSS allowlist (plmbr#296), Copilot WS auth + origin (plmbr#301), GHE host detection (plmbr#292), fastmcp -> mcp SDK swap (plmbr#324). - Fixed: session listing unification (plmbr#310), session preview unwrap (plmbr#331), down-area runtime throw (plmbr#330 follow-up), WS message-handler leak (plmbr#294). - Removed: fastmcp dependency, history.jsonl session gate. Adds a Migration note covering the five behavior changes operators should review before upgrading from 4.x: fastmcp swap, path sandboxes, history.jsonl gate removal, workspace @-mention pointer shape, and the Copilot WebSocket auth/origin tightening. Two reviewer rounds (six personas each) applied: - Round 1 caught security overclaims (plmbr#293, plmbr#299, plmbr#323), the plmbr#284/plmbr#303 mis-attribution, missing migration note, 3 em dashes, and the stale `fastmcp==2.x.*` recommendation in the admin guide. - Round 2 caught the missing plmbr#301 migration bullet, missing version- matrix 5.0.x row, missing README TOC entry, and a couple of style nits (sub-heading overpromise, orphan bullet). Skipped (deferred to future PRs): - README first-run tour mention. - Admin guide HTTP kill-switch row in Failure-modes table. - Terminal drag-drop trust-model precision update after plmbr#327. - Cipher description nit in plmbr#293 (Fernet AES-128-CBC+HMAC, not AES-GCM).

pjdoland added the bug Something isn't working label May 22, 2026

mbektas approved these changes May 22, 2026

View reviewed changes

mbektas merged commit af4398c into plmbr:main May 22, 2026
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(claude): steer system prompt away from over-eager notebook creation#336

fix(claude): steer system prompt away from over-eager notebook creation#336
mbektas merged 1 commit into
plmbr:mainfrom
pjdoland:fix/335-claude-notebook-creation-bias

pjdoland commented May 22, 2026

Uh oh!

pjdoland commented May 22, 2026

Uh oh!

mbektas left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pjdoland commented May 22, 2026

Summary

Solution

Testing

Risks / follow-ups

Issue linkage

Uh oh!

pjdoland commented May 22, 2026

Uh oh!

mbektas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants