Skip to content

fix(server): add backpressure to SSE event queues#19423

Closed
BYK wants to merge 18 commits intoanomalyco:devfrom
BYK:fix/sse-backpressure
Closed

fix(server): add backpressure to SSE event queues#19423
BYK wants to merge 18 commits intoanomalyco:devfrom
BYK:fix/sse-backpressure

Conversation

@BYK
Copy link
Copy Markdown
Contributor

@BYK BYK commented Mar 27, 2026

Issue for this PR

Closes #16697

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Adds backpressure to the AsyncQueue used by SSE endpoints to prevent unbounded memory growth when clients fall behind.

Root cause: AsyncQueue has no size limit. When an SSE client stalls (slow network, backgrounded tab, stalled proxy), Bus.subscribeAll keeps pushing JSON.stringify'd events into the queue without bound. In production this caused 187GB RSS (forensic report).

Fix: Add optional capacity parameter to AsyncQueue with drop-oldest behavior. When the queue exceeds capacity, oldest items are discarded. Set to 1024 items (~2MB at ~2KB avg event size) for both SSE endpoints (event.ts and global.ts). TUI queues remain unbounded since they are 1:1 request/response pairs that never accumulate.

The web UI already handles reconnection gracefully (full state reload on server.connected), so dropped events during a stall are recovered naturally.

3 files changed, ~7 lines.

How did you verify your code works?

  • Bus and snapshot tests pass
  • Verified queue drops oldest items when capacity exceeded
  • Existing SSE consumers (TUI) unaffected — no capacity parameter means unbounded

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

BYK added 18 commits April 2, 2026 20:05
meta.limit grows monotonically via loadMore() as the user scrolls up
through message history (200 → 400 → 600 → ...). When switching sessions
and returning, the 15s TTL expires and triggers a force-refresh that
reuses this inflated limit as the fetch size — re-downloading the entire
browsing history from scratch on every session switch.

Cap the limit to messagePageSize (200) on force-refreshes so only the
latest page is fetched. loadMore() still works normally after returning
since loadMessages() resets meta.cursor to match the fresh page.
…ded dir walk

Three changes to reduce server event loop contention when many projects
load concurrently (e.g. opening ~/Code with 17 sidebar projects):

1. Instance bootstrap concurrency limit (p-limit, N=3):
   Each Instance bootstrap spawns 5-7 subprocesses (git, ripgrep,
   parcel/watcher). With 17 concurrent bootstraps that's ~100 subprocesses
   overwhelming a 4-core SATA SSD system. The p-limit semaphore gates new
   directory bootstraps while allowing cache-hit requests through instantly.
   Peak subprocesses: ~100 → ~18.

2. Async Filesystem.exists and isDir:
   Filesystem.exists() wrapped existsSync() in an async function — it
   looked async but blocked the event loop on every call. Replaced with
   fs.promises.access(). Same for isDir (statSync → fs.promises.stat).
   This lets the event loop serve health checks and API responses while
   directory walk-ups are in progress. Added Filesystem.existsSync() for
   callers that genuinely need sync behavior.

3. Bounded Filesystem.up/findUp walk depth (maxDepth=10):
   Previously walked all the way to / with no limit. Added maxDepth
   parameter (default 10) as a safety net against degenerate deep paths.
…g in memory

Previously, bash tool and /shell command accumulated ALL stdout/stderr
in an unbounded string — a verbose command could grow to hundreds of MB.

Now output beyond Truncate.MAX_BYTES (50KB) is streamed directly to a
spool file on disk. Only the first 50KB is kept in memory for the
metadata preview. The full output remains recoverable via the spool file
path included in the tool result metadata.
- LSP: delete diagnostic entries when server publishes empty array
  instead of keeping empty arrays in the map forever
- RPC: add 60s timeout to pending calls so leaked promises don't
  accumulate indefinitely
- FileTime: skip (already handled by Effect InstanceState migration)
Two optimizations to drastically reduce memory during prompting:

1. filterCompactedLazy: probe newest 50 message infos (1 query, no
   parts) to detect compaction. If none found, fall back to original
   single-pass filterCompacted(stream()) — avoids 155+ wasted info-only
   queries for uncompacted sessions. Compacted sessions still use the
   efficient two-pass scan.

2. Context-window windowing: before calling toModelMessages, estimate
   which messages from the tail fit in the LLM context window using
   model.limit.context * 4 chars/token. Only convert those messages to
   ModelMessage format. For a 7,704-message session where ~200 fit in
   context, this reduces toModelMessages input from 7,704 to ~200
   messages — cutting ~300MB of wrapper objects across 4-5 copy layers
   down to ~10MB.

Also caches conversation across prompt loop iterations — full reload
only after compaction, incremental merge for tool-call steps.
AsyncQueue is unbounded — when SSE clients fall behind (slow network,
stalled browser tab), JSON-serialized bus events accumulate without limit.
This caused 187GB RSS in production (anomalyco#16697).

Add optional capacity parameter to AsyncQueue with drop-oldest behavior.
Set to 1024 items (~2MB) for both SSE endpoints. TUI queues remain
unbounded (1:1 request/response, never accumulate).
… minimize

Two additional fixes:

1. Add plan_exit/plan_enter to the tools disable map in task.ts.
   The session permissions were being overwritten by SessionPrompt.prompt()
   which converts the tools map into session permissions. Without plan_exit
   in the tools map, it wasn't being denied.

2. Add minimize/expand toggle to the question dock so users can collapse
   it to read the conversation while a question is pending. Adds a
   chevron button in the header and makes the title clickable to toggle.
   DockPrompt gains a minimized prop that hides content and footer.
…selection

1. Restore Markdown component usage in live question dock — the
   rebase dropped it, leaving plain text rendering while the import
   was still present.

2. Refuse plan_exit when plan file is empty/missing — return an error
   telling the agent to write the plan first instead of showing an
   empty question to the user.

3. Add question-text to the user-select: text allow-list so plan
   content in the question dock is selectable and copyable.
- One-time migration to incremental auto-vacuum (PRAGMA auto_vacuum=2)
  so disk space is reclaimed when sessions are deleted
- Add Database.checkpoint() (TRUNCATE mode) and Database.vacuum()
  (incremental_vacuum(500)) helpers
Snapshot.patch() now stores relative paths (instead of joining with worktree
root) and caps at 1000 entries. Revert handles both old absolute and new
relative paths via path.isAbsolute() for backward compat.

Summary diffs (summarizeSession/summarizeMessage) use a ~1MB byte budget
based on actual before+after content size instead of an arbitrary count cap.
This prevents multi-MB payloads while allowing many small file diffs.

Closes anomalyco#18921
Build on anomalyco#19299 by @thdxr with production hardening for the new
server architecture (anomalyco#19335):

- Add pre-bootstrap static middleware in server.ts before
  WorkspaceRouterMiddleware to avoid Instance.provide() + DB migration
  checks on every CSS/JS/image/font request
- Add SPA fallback routing — page routes (no extension) get index.html,
  asset requests (.js, .woff2) fall through to CDN proxy
- Add OPENCODE_APP_DIR env var for explicit dev override
- Auto-detect packages/app/dist in monorepo dev mode
- Use explicit MIME type map for consistent content-type headers
- CDN proxy fallback for assets not found locally/embedded
The plan_exit tool renders the entire plan file as markdown in the
question dock. Two issues prevented scrolling:

1. DockShell used overflow: clip which prevents all descendant scrolling.
   Changed to overflow: hidden (still clips at border-radius but allows
   inner scroll containers).

2. question-content had no overflow — added overflow-y: auto so both
   the question text and options scroll together when content exceeds
   the dock height.
@BYK BYK force-pushed the fix/sse-backpressure branch from 2631130 to 13a044a Compare April 9, 2026 13:23
@BYK BYK requested a review from adamdotdevin as a code owner April 9, 2026 13:23
@BYK
Copy link
Copy Markdown
Contributor Author

BYK commented Apr 9, 2026

Closing — changes have been incorporated into dev.

@BYK BYK closed this Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multiple memory leaks cause unbounded RAM growth during extended TUI usage

1 participant