Skip to content

fix(extension): free WS message-callback handlers when requests finish#294

Merged
mbektas merged 2 commits into
plmbr:mainfrom
pjdoland:fix/ws-callback-handlers-leak
May 18, 2026
Merged

fix(extension): free WS message-callback handlers when requests finish#294
mbektas merged 2 commits into
plmbr:mainfrom
pjdoland:fix/ws-callback-handlers-leak

Conversation

@pjdoland
Copy link
Copy Markdown
Collaborator

Summary

`WebsocketCopilotHandler._messageCallbackHandlers` was a dict keyed by request `messageId`, populated when chat / generate-code / inline-completion requests started but never pruned. Every long chat session leaked one response emitter + cancel token per turn for the lifetime of the websocket connection. Entries still in flight at disconnect pinned their state until the worker happened to return on its own.

This was flagged in the security audit because the leaked entries hold a reference to the response emitter, which retains chat-history content; in a long session that's PII / prompt content sitting in memory after it should have been collectable.

Solution

Worker threads now run through a new `_run_request_thread` wrapper that pops the messageId from `_messageCallbackHandlers` in a `finally` so the cleanup fires whether the coroutine returns or raises. All three thread-spawn sites (chat, generate-code, inline-completion) route through it. `on_close` also clears the dict so a long-running request left in flight at disconnect releases its emitter and cancel token immediately rather than at coroutine completion.

Testing

Six new pytest cases in `tests/test_ws_callback_handler_leak.py`:

  • normal-return cleanup
  • raise-and-still-cleanup
  • unknown-id-is-safe (idempotent pop)
  • multiple concurrent requests each clean up only their own entry
  • `on_close` drops all in-flight entries
  • `on_close` is idempotent

Verified regression detection by reverting `extension.py` to `upstream/main` and re-running: 5 of 6 tests fail.

`pytest tests/` (738 passing, +6 new), `jlpm tsc --noEmit`, `jlpm lint:check`, `jlpm jest` (154 passing). All four green.

WebsocketCopilotHandler._messageCallbackHandlers was populated for every
chat / generate-code / inline-completion request (one entry per
messageId, holding a response emitter and a cancel token) and never
pruned. A long chat session leaked one entry per turn for the lifetime
of the websocket connection, and entries still in flight at disconnect
pinned their state until the worker thread happened to return.

Worker threads now run through `_run_request_thread`, which pops the
messageId from the dict in a `finally` so the cleanup fires whether the
coroutine returns or raises. `on_close` also clears the dict so a
long-running request left in flight at disconnect releases its emitter
and cancel token immediately rather than at coroutine completion.

Tests pin the normal-return cleanup, the raise cleanup, the
unknown-id-is-safe behavior, the concurrent-requests-don't-stomp
behavior, and the on_close + on_close-is-idempotent contract.
@pjdoland pjdoland added the bug Something isn't working label May 18, 2026
Six-agent review polish: the previous try/except pattern would pass
even if the wrapper started swallowing exceptions. pytest.raises pins
the contract that the wrapper deliberately re-raises so thread-level
diagnostics see the failure.
@mbektas mbektas merged commit eb4a12c into plmbr:main May 18, 2026
4 checks passed
pjdoland added a commit to pjdoland/notebook-intelligence that referenced this pull request May 18, 2026
Second-pass review surfaced a clean merge conflict where main added
the per-message callback-cleanup helper (PR plmbr#294's _run_request_thread)
adjacent to where this PR decorated open() with @ws_authenticated.
Resolve by keeping _run_request_thread first and the decorator
immediately above open(). The two are orthogonal and compose cleanly.

Add two tests the first round missed:

- A symmetric MRO assertion that check_xsrf_cookie resolves to
  JupyterHandler (the upgrade-aware override that skips XSRF only
  when origin + auth are already satisfied). The matching
  check_origin assertion was already pinned to WebSocketMixin.

- A caplog assertion that the accepted-upgrade audit line fires with
  the resolved user identity and the Origin header. Incident triage
  relies on that log; a future refactor that drops it would otherwise
  ship unnoticed.
pjdoland added a commit to pjdoland/notebook-intelligence that referenced this pull request May 22, 2026
Promotes the [Unreleased] CHANGELOG snapshot to [5.0.0] - 2026-05-22
and expands it to cover everything merged into upstream/main after
PR plmbr#287's docs refresh. Bumps package.json to 5.0.0.

CHANGELOG additions cover the post-plmbr#287 surface:

- Settings tabs: plugin marketplace picker (plmbr#284), plugin marketplace
  details + Update button (plmbr#303), per-workspace MCP disable (plmbr#286),
  JSON-paste path in Add MCP server (plmbr#285).
- Launchers: hide-with-policy (plmbr#288), brand icons for Codex / opencode
  (plmbr#325, plmbr#333), per-launch directory picker (plmbr#332).
- Chat sidebar and agentic UX: workspace @-mention in Claude mode
  (plmbr#327), reload-open-files-on-disk (plmbr#330), steered system prompt
  away from over-eager notebook creation (plmbr#336).
- Skills: multi-manifest support (plmbr#321), tracks-upstream for user-
  imported skills (plmbr#322), HTTP kill switch for the reconciler (plmbr#291).
- Accessibility: full sub-section covering plmbr#305-plmbr#320.
- Security: shell-tool sandbox (plmbr#290), Claude UI-bridge sandbox (plmbr#323),
  0o600 on encrypted token (plmbr#293), env-secret scrubbing (plmbr#295), MCP
  config shape validation (plmbr#299), XSS allowlist (plmbr#296), Copilot WS
  auth + origin (plmbr#301), GHE host detection (plmbr#292), fastmcp -> mcp SDK
  swap (plmbr#324).
- Fixed: session listing unification (plmbr#310), session preview unwrap
  (plmbr#331), down-area runtime throw (plmbr#330 follow-up), WS message-handler
  leak (plmbr#294).
- Removed: fastmcp dependency, history.jsonl session gate.

Adds a Migration note covering the five behavior changes operators
should review before upgrading from 4.x: fastmcp swap, path
sandboxes, history.jsonl gate removal, workspace @-mention pointer
shape, and the Copilot WebSocket auth/origin tightening.

Two reviewer rounds (six personas each) applied:
- Round 1 caught security overclaims (plmbr#293, plmbr#299, plmbr#323), the
  plmbr#284/plmbr#303 mis-attribution, missing migration note, 3 em dashes,
  and the stale `fastmcp==2.x.*` recommendation in the admin guide.
- Round 2 caught the missing plmbr#301 migration bullet, missing version-
  matrix 5.0.x row, missing README TOC entry, and a couple of style
  nits (sub-heading overpromise, orphan bullet).

Skipped (deferred to future PRs):
- README first-run tour mention.
- Admin guide HTTP kill-switch row in Failure-modes table.
- Terminal drag-drop trust-model precision update after plmbr#327.
- Cipher description nit in plmbr#293 (Fernet AES-128-CBC+HMAC, not
  AES-GCM).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants