fix(xterm): patch IntersectionObserver retention (-367 MB/30 mode-toggles)#617
Merged
fix(xterm): patch IntersectionObserver retention (-367 MB/30 mode-toggles)#617
Conversation
…fork Bumps the pnpm override pointer from `fix/dispose-leaks-built` to `fix/kolu-xterm-fixes-built`, which stacks a second fix on top of the existing dispose-leaks patches: - xtermjs/xterm.js#5817 (`fix/dispose-leaks`): register CursorBlinkStateManager + _pausedResizeTask disposables. Already in production. - xtermjs/xterm.js#5821 (new, `fix/intersection-observer-weakref`): wrap `this` in a `WeakRef` inside `RenderService`'s IntersectionObserver callback. `observer.disconnect()` on our side wasn't releasing the callback in practice — heap snapshot showed 175,594 retained Uint32Arrays (~220 MB / 30 mode-toggles with 7 terminals) traced through the global IntersectionObserver registry → callback closure → RenderService → BufferService → BufferLines. WeakRef breaks that chain regardless of why the native registry held the callback. Local measurement on kolu@zest (fresh tab, 30 canvas↔focus toggles, 7 terminals restored): Before fix → After fix Task Manager Memory Footprint +367 MB → -3 MB BufferLine (Z1) instances Δ +175,594 → ~0 Upstream tracking: xtermjs/xterm.js#5820 (issue), xtermjs/xterm.js#5821 (PR). Fork consumption will collapse to a plain version bump once upstream merges + releases.
c9794db to
6800302
Compare
srid
added a commit
that referenced
this pull request
Apr 17, 2026
…osis Adds the IntersectionObserver / BufferLine retention story to memory-learnings.md (#617, upstream xtermjs/xterm.js#5820 + #5821). Skill updates: - New "Ground truth: Task Manager Memory Footprint, not proxies" section. performance.memory, system/Context count, closure:* count are all proxies that can diverge from Task Manager by 100x. #614 reduced Context growth 89% across six commits with zero Memory Footprint improvement — the cautionary tale. - New "Quiet-session A/B" requirement. Active agent terminals grow xterm scrollback legitimately; measurements on a busy session look indistinguishable from retention. #618 closed after a quiet-session A/B showed the +69 MB residual was all agent-stream activity. - New leak shape "Callback retained past dispose()" — when an observer's disconnect()/dispose() doesn't fully release the callback closure in practice (DevTools instrumentation, extensions, native registry quirks). Fix pattern: wrap `this` in WeakRef inside the callback. This is what #617 did for xterm's RenderService. - Promoted diff-heap.mjs + find-retainers.mjs to the top of the analyzer list, sorted by "start here". Sort heap diffs by bytes, not count — a 220 MB Uint32Array leak dominates any number of 40-byte Context churn. Also commits the two diagnostic scripts that had been sitting untracked in docs/perf-investigations/scripts/. Regenerated .claude/skills/perf-diagnose/SKILL.md via `just ai::apm`.
4 tasks
srid
added a commit
that referenced
this pull request
Apr 17, 2026
…osis Adds the IntersectionObserver / BufferLine retention story to memory-learnings.md (#617, upstream xtermjs/xterm.js#5820 + #5821). Skill updates: - New "Ground truth: Task Manager Memory Footprint, not proxies" section. performance.memory, system/Context count, closure:* count are all proxies that can diverge from Task Manager by 100x. #614 reduced Context growth 89% across six commits with zero Memory Footprint improvement — the cautionary tale. - New "Quiet-session A/B" requirement. Active agent terminals grow xterm scrollback legitimately; measurements on a busy session look indistinguishable from retention. #618 closed after a quiet-session A/B showed the +69 MB residual was all agent-stream activity. - New leak shape "Callback retained past dispose()" — when an observer's disconnect()/dispose() doesn't fully release the callback closure in practice (DevTools instrumentation, extensions, native registry quirks). Fix pattern: wrap `this` in WeakRef inside the callback. This is what #617 did for xterm's RenderService. - Promoted diff-heap.mjs + find-retainers.mjs to the top of the analyzer list, sorted by "start here". Sort heap diffs by bytes, not count — a 220 MB Uint32Array leak dominates any number of 40-byte Context churn. Also commits the two diagnostic scripts that had been sitting untracked in docs/perf-investigations/scripts/. Regenerated .claude/skills/perf-diagnose/SKILL.md via `just ai::apm`.
srid
added a commit
that referenced
this pull request
Apr 17, 2026
… skill (#619) ## Summary Documents the Chapter 3 investigation so future agents don't burn three days chasing proxy metrics the way we did. **Source of truth edits** (regenerated into `.claude/` via `just ai::apm`): - `docs/perf-investigations/memory-learnings.md` — new Chapter 3 section covering #614 (closed without merge, the false trail) and #617 (the one-line WeakRef that actually moved Task Manager Memory Footprint by −81%). - `agents/.apm/skills/perf-diagnose/SKILL.md` — runbook additions: - New "Ground truth: Task Manager Memory Footprint, not proxies" rule at the top. `performance.memory`, `system/Context` count, and `closure:*` count are all proxies — they can drift 100× from Task Manager and mislead you into declaring a fix that does nothing. - New "Quiet-session A/B" step. Active agent terminals grow scrollback buffers legitimately; a busy-session measurement looks indistinguishable from retention. - New **third** leak shape: "Callback retained past `dispose()`". When a `Window.<Observer>` native registry (or a DevTools extension wrapping it) holds the callback closure past the explicit `observer.disconnect()`, the callback keeps `this` (and its whole service graph) reachable. Fix pattern: wrap `this` in `WeakRef` inside the callback. - Promoted `diff-heap.mjs` + `find-retainers.mjs` to the top of the analyzer list with "start here" language. Sort heap diffs by **bytes**, not count — a 220 MB `Uint32Array` leak drowns any number of 40-byte Context churn. **New committed tooling** (had been sitting untracked): - `docs/perf-investigations/scripts/diff-heap.mjs` — per-class byte-delta between two heap snapshots. - `docs/perf-investigations/scripts/find-retainers.mjs` — BFS from GC roots to every instance of a target class, grouped by path signature. ## Why The three-day version of the story: I jumped straight to heap snapshots, found tens of thousands of retained `system/Context` objects, and shipped six refactoring commits on #614 that reduced that count 89%. Task Manager didn't move. The actual load-bearing retention was a single `IntersectionObserver` callback in `xterm`'s `RenderService` holding 220 MB of `Uint32Array` BufferLines. One heap diff sorted by bytes (not count) named the culprit in one line of output; one `WeakRef` wrap fixed it. Codifying these so the next agent doesn't re-tread the path. ## Test plan - [x] `just fmt` clean - [x] `just ai::apm` regenerated `.claude/skills/perf-diagnose/SKILL.md` to match APM source - [ ] `ci/apm-sync` validates `.claude/` matches sources - [ ] `ci/fmt` + `ci/nix` green 🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the canvas/focus-toggle memory leak that was pushing production pureintent to 1.2 GB Memory Footprint. The leak is upstream in
xterm.js; this PR bumps thepnpm.overridespointer to a fork branch that stacks one additional fix on top offix/dispose-leaks-built.Upstream tracking: xtermjs/xterm.js#5820 (issue), xtermjs/xterm.js#5821 (PR).
The leak
RenderService._registerIntersectionObservercreates anIntersectionObserverwhose callback closes overthisdirectly. Although_observerDisposablecallsobserver.disconnect()on dispose, the retained callback chain keepsthis(RenderService) alive in practice — along with_coreService → _bufferService → buffers → BufferLine → Uint32Arraycell data.Heap-snapshot diff across 30 mount/unmount cycles of 7
Terminalinstances:native:system / JSArrayBufferDataobject:Uint32Arrayobject:ArrayBufferobject:BufferLineEvery retained
Uint32Arraytraced through the same retainer signature: globalIntersectionObserverregistry → callback closure →RenderService→ service graph → BufferLines. 175,594 = 30 toggles × 7 terminals × ~830 scrollback lines — basically the full buffer of every disposedTerminalpinned pastterminal.dispose().The fix wraps
thisin aWeakRef. Functional semantics preserved: while RenderService is alive,deref()returns it and the handler runs unchanged. Once no strong refs remain, the callback is a no-op and the BufferService graph can GC.Ground-truth measurement
Local
kolu@zest(fresh tab, 30 canvas↔focus toggles, 7 terminals restored from session):Also verified the
fix/dispose-leaksfix (upstream #5817) still ships — the stacked fork branch contains both.Alternatives considered
disconnect()isn't releasing the callback. Could be DevTools instrumentation, a Chrome extension patchingwindow.IntersectionObserver, a native registry quirk — unclear, and likely environment-dependent. The WeakRef wrap is defensive and preserves semantics regardless of root cause._coreService/_bufferServiceinRenderService.dispose(). Narrower scope but requires a dispose override. WeakRef is cleaner — no new dispose path, just a smaller capture surface.Test plan
just checkpassespureintent, retest footprint delta on production prod-like build🤖 Diagnosis assisted by Claude Code