perf(opencode): reduce redundant summary and queue overhead#21507
perf(opencode): reduce redundant summary and queue overhead#21507GuestAUser wants to merge 3 commits intoanomalyco:devfrom
Conversation
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
|
Hey! Your PR title Please update it to start with one of:
Where See CONTRIBUTING.md for details. |
|
This PR doesn't fully meet our contributing guidelines and PR template. What needs to be fixed:
Please edit this PR description to address the above within 2 hours, or it will be automatically closed. If you believe this was flagged incorrectly, please let a maintainer know. |
|
The following comment was made by an LLM, it may be inaccurate: Potential Related PR FoundPR #20303: refactor(opencode): optimize doom loop detection, summary debounce, parallel plugin events This PR may be related because it also addresses summary debouncing/optimization in opencode, which overlaps with PR #21507's work on reducing redundant summary calls and processor cleanup. However, the exact relationship (whether closed, merged, or addressing different aspects) should be verified. PR #19237: perf(opencode): reduce streaming latency and request overhead This PR addresses similar hot-path performance concerns in opencode streaming, though focused on a different area. |
|
This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window. Feel free to open a new pull request that follows our guidelines. |
Core cache optimizations: - Move mindContext from dynamicSystem to stableSystem (500-2000+ tokens/turn cached at BP1 for sessions with SessionMind context) - Split failureContext into stableFailures (prior turns, BP1 cached) and dynamicFailures (current turn only) using signature-based dedup - Add markLargeToolResults() pre-pass: cache_control on tool-result content parts >7000 chars (~2000 tokens), Anthropic direct + OpenRouter Claude - Fix stale parts reference bug in markLargeToolResults for multi-tool messages - Add compressImages() async pre-pass via sharp (PR anomalyco#21371): 3-phase quality->dimension->fallback compression prevents 5MB API limit errors - Session snapshot resets (resetFailureSnapshot/resetEnvDynamicSent) in cleanup - prompt_async idle race condition fix: check new messages before loop break Upstream PR cherry-picks: - PR anomalyco#21535: deterministic queued message wrapping eliminates per-turn cache miss - PR anomalyco#21492: tool evidence digest (evidence.ts) preserves context through compaction - PR anomalyco#21507: session processor single-flight summary dedup improvements - PR anomalyco#21528: prompt_async idle wakeup race condition fix - PR anomalyco#21500: Levenshtein O(min(N,M)) space with Int32Array two-row algorithm New tools (PR anomalyco#21399): - ContextUsageTool (check_context_usage): real-time token/cache usage reporting - NewSessionTool (new_session): TUI-only, abort + create new session - TuiEvent.SessionNew bus event and app.tsx handler - SDK types.gen.ts/sdk.gen.ts EventTuiSessionNew type Test infrastructure: - E2E cache tests (OPENCODE_E2E=1) verified 100% cache hit rate on T2+ - Unit tests for large-tool cache breakpoints (4 scenarios) - Fix pre-existing lsp-deps.test.ts assertion bug (LspTool in make() not all()) - Add await to all ProviderTransform.message() call sites (now async)
…rchestrator, multi-credential, codebase indexer Core Features: - Session Mind with persistent memory across sessions - Orchestrator + Worker subagent architecture - Multi-credential OAuth with auto-refresh - Codebase indexer and watcher connectors - Footer status bar with live metrics Cache & Prompt Optimizations: - Move mindContext/failureContext to stable system prefix (BP1 cached) - Large tool result cache_control breakpoints (>7000 chars) - Deterministic message wrapping (PR anomalyco#21535) - Tool evidence digest through compaction (PR anomalyco#21492) - O(1) queue dequeue + single-flight summary (PR anomalyco#21507) - Levenshtein O(min(N,M)) space optimization (PR anomalyco#21500) - Three-phase image auto-compression (PR anomalyco#21371) - ContextUsage and NewSession tools (PR anomalyco#21399) - E2E cache integration tests with real Anthropic OAuth Session snapshot resets prevent memory leaks on session delete.
…rchestrator, multi-credential, codebase indexer Core Features: - Session Mind with persistent memory across sessions - Orchestrator + Worker subagent architecture - Multi-credential OAuth with auto-refresh - Codebase indexer and watcher connectors - Footer status bar with live metrics Cache & Prompt Optimizations: - Move mindContext/failureContext to stable system prefix (BP1 cached) - Large tool result cache_control breakpoints (>7000 chars) - Deterministic message wrapping (PR anomalyco#21535) - Tool evidence digest through compaction (PR anomalyco#21492) - O(1) queue dequeue + single-flight summary (PR anomalyco#21507) - Levenshtein O(min(N,M)) space optimization (PR anomalyco#21500) - Three-phase image auto-compression (PR anomalyco#21371) - ContextUsage and NewSession tools (PR anomalyco#21399) - E2E cache integration tests with real Anthropic OAuth Session snapshot resets prevent memory leaks on session delete.
…rchestrator, multi-credential, codebase indexer Core Features: - Session Mind with persistent memory across sessions - Orchestrator + Worker subagent architecture - Multi-credential OAuth with auto-refresh - Codebase indexer and watcher connectors - Footer status bar with live metrics Cache & Prompt Optimizations: - Move mindContext/failureContext to stable system prefix (BP1 cached) - Large tool result cache_control breakpoints (>7000 chars) - Deterministic message wrapping (PR anomalyco#21535) - Tool evidence digest through compaction (PR anomalyco#21492) - O(1) queue dequeue + single-flight summary (PR anomalyco#21507) - Levenshtein O(min(N,M)) space optimization (PR anomalyco#21500) - Three-phase image auto-compression (PR anomalyco#21371) - ContextUsage and NewSession tools (PR anomalyco#21399) - E2E cache integration tests with real Anthropic OAuth Session snapshot resets prevent memory leaks on session delete.
Core cache optimizations: - Move mindContext from dynamicSystem to stableSystem (500-2000+ tokens/turn cached at BP1 for sessions with SessionMind context) - Split failureContext into stableFailures (prior turns, BP1 cached) and dynamicFailures (current turn only) using signature-based dedup - Add markLargeToolResults() pre-pass: cache_control on tool-result content parts >7000 chars (~2000 tokens), Anthropic direct + OpenRouter Claude - Fix stale parts reference bug in markLargeToolResults for multi-tool messages - Add compressImages() async pre-pass via sharp (PR anomalyco#21371): 3-phase quality->dimension->fallback compression prevents 5MB API limit errors - Session snapshot resets (resetFailureSnapshot/resetEnvDynamicSent) in cleanup - prompt_async idle race condition fix: check new messages before loop break Upstream PR cherry-picks: - PR anomalyco#21535: deterministic queued message wrapping eliminates per-turn cache miss - PR anomalyco#21492: tool evidence digest (evidence.ts) preserves context through compaction - PR anomalyco#21507: session processor single-flight summary dedup improvements - PR anomalyco#21528: prompt_async idle wakeup race condition fix - PR anomalyco#21500: Levenshtein O(min(N,M)) space with Int32Array two-row algorithm New tools (PR anomalyco#21399): - ContextUsageTool (check_context_usage): real-time token/cache usage reporting - NewSessionTool (new_session): TUI-only, abort + create new session - TuiEvent.SessionNew bus event and app.tsx handler - SDK types.gen.ts/sdk.gen.ts EventTuiSessionNew type Test infrastructure: - E2E cache tests (OPENCODE_E2E=1) verified 100% cache hit rate on T2+ - Unit tests for large-tool cache breakpoints (4 scenarios) - Fix pre-existing lsp-deps.test.ts assertion bug (LspTool in make() not all()) - Add await to all ProviderTransform.message() call sites (now async)
Summary
SessionSummary.summarize()calls so concurrent summary work shares one computation instead of re-reading messages and re-running snapshot diffs in parallelAsyncQueue'sArray.shift()dequeue path with a head-index queue to remove O(n) copies from hot SSE / event / TUI delivery pathsSessionProcessorcleanup by using the in-memoryctx.toolcallsmap that already tracks active tool callsWhy
The current opencode hot path still pays for a few avoidable costs in high-frequency paths:
{sessionID, messageID}pair, which duplicates message hydration and diff workshift()costs during sustained event deliveryNone of these change the user-visible model behavior. The goal here is to shave redundant work from core loop infrastructure while preserving fault tolerance and existing tool-loop semantics.
What changed
1. Single-flight session summary work
packages/opencode/src/session/summary.tsInstanceStateEffectCache.make()to share one in-flight summarize operation per[sessionID, messageID]packages/opencode/test/session/summary.test.ts2. O(1) queue dequeue path
packages/opencode/src/util/queue.tsshift()packages/opencode/test/util/queue.test.ts3. Processor cleanup avoids redundant reads
packages/opencode/src/session/processor.tsctx.toolcallsmap rather than rereading all parts for the assistant messageScope
This PR is intentionally limited to
packages/opencode/**hot-path infrastructure.It does not include the unrelated local
packages/console/**changes currently present in my working tree.Validation
Typecheck
Run from
packages/opencode:bun typecheckTargeted tests
Run from
packages/opencode:bun test test/session/summary.test.ts test/util/queue.test.ts test/session/processor-effect.test.tsResult:
Manual QA
Run from
packages/opencode:bun -e 'import { AsyncQueue } from "./src/util/queue"; const q = new AsyncQueue(); q.push("a"); q.push("b"); console.log([await q.next(), await q.next()].join(","))'a,bSessionSummary.layersummary_messages:1summary_calls:1Microbench checks
Run from
packages/opencode:shift:18.13msidx:12.73mswarm_same:24.12msvary_agent:32.38msTrade-offs / things to watch
Follow-up measurement plan
After merge, the next useful measurement is an end-to-end agent-loop benchmark on a fixed fixture repo that captures:
That follow-up would quantify how much these micro-optimizations move real session throughput under realistic load.