Skip to content

perf(opencode): reduce redundant summary and queue overhead#21507

Closed
GuestAUser wants to merge 3 commits intoanomalyco:devfrom
GuestAUser:perf/opencode-core-hot-path
Closed

perf(opencode): reduce redundant summary and queue overhead#21507
GuestAUser wants to merge 3 commits intoanomalyco:devfrom
GuestAUser:perf/opencode-core-hot-path

Conversation

@GuestAUser
Copy link
Copy Markdown

Summary

  • dedupe identical in-flight SessionSummary.summarize() calls so concurrent summary work shares one computation instead of re-reading messages and re-running snapshot diffs in parallel
  • replace AsyncQueue's Array.shift() dequeue path with a head-index queue to remove O(n) copies from hot SSE / event / TUI delivery paths
  • avoid rereading persisted tool parts during SessionProcessor cleanup by using the in-memory ctx.toolcalls map that already tracks active tool calls

Why

The current opencode hot path still pays for a few avoidable costs in high-frequency paths:

  • summary generation can be triggered concurrently for the same {sessionID, messageID} pair, which duplicates message hydration and diff work
  • queue consumers pay repeated shift() costs during sustained event delivery
  • processor cleanup rereads message parts that are already available in memory

None of these change the user-visible model behavior. The goal here is to shave redundant work from core loop infrastructure while preserving fault tolerance and existing tool-loop semantics.

What changed

1. Single-flight session summary work

packages/opencode/src/session/summary.ts

  • adds per-instance summary state through InstanceState
  • uses Effect Cache.make() to share one in-flight summarize operation per [sessionID, messageID]
  • invalidates the cache entry after completion so this remains single-flight work sharing rather than long-lived result caching
  • keeps the rest of the summary pipeline unchanged: session summary aggregation, stored diff write, diff event publish, and user-message summary update

packages/opencode/test/session/summary.test.ts

  • adds a regression test that fires two concurrent summarize calls
  • verifies only one message load and one diff computation happen

2. O(1) queue dequeue path

packages/opencode/src/util/queue.ts

  • adds a queue head index instead of removing from the front of the array with shift()
  • compacts the backing array only when enough items have been consumed to make compaction worthwhile
  • keeps ordering semantics and waiter handoff behavior unchanged

packages/opencode/test/util/queue.test.ts

  • verifies long ordered dequeue behavior
  • verifies waiting readers still receive pushed values correctly

3. Processor cleanup avoids redundant reads

packages/opencode/src/session/processor.ts

  • cleanup now iterates the in-memory ctx.toolcalls map rather than rereading all parts for the assistant message
  • only unfinished in-memory tool calls are converted to terminal error state during abort / cleanup
  • explicitly leaves the existing doom-loop detection semantics intact

Scope

This PR is intentionally limited to packages/opencode/** hot-path infrastructure.
It does not include the unrelated local packages/console/** changes currently present in my working tree.

Validation

Typecheck

Run from packages/opencode:

  • bun typecheck

Targeted tests

Run from packages/opencode:

  • bun test test/session/summary.test.ts test/util/queue.test.ts test/session/processor-effect.test.ts

Result:

  • all targeted tests passed

Manual QA

Run from packages/opencode:

  • bun -e 'import { AsyncQueue } from "./src/util/queue"; const q = new AsyncQueue(); q.push("a"); q.push("b"); console.log([await q.next(), await q.next()].join(","))'
    • output: a,b
  • summary single-flight probe against SessionSummary.layer
    • output: summary_messages:1
    • output: summary_calls:1

Microbench checks

Run from packages/opencode:

  • queue comparison over 200000 push+pop operations
    • shift:18.13ms
    • idx:12.73ms
  • warmed tool-registry benchmark used during validation of the broader hot-path investigation
    • warm_same:24.12ms
    • vary_agent:32.38ms

Trade-offs / things to watch

  • summary single-flight state is bounded to 1024 keys and invalidated after completion, so it should not become a persistent result cache
  • queue compaction trades occasional slice work for much cheaper steady-state dequeue behavior
  • processor cleanup now trusts the authoritative in-memory in-flight tool map during teardown, which is the same state the stream handler mutates during execution

Follow-up measurement plan

After merge, the next useful measurement is an end-to-end agent-loop benchmark on a fixed fixture repo that captures:

  • wall-clock latency per loop iteration
  • CPU time
  • RSS / heap growth under repeated runs
  • count of summary invocations and tool-cleanup reads

That follow-up would quantify how much these micro-optimizations move real session throughput under realistic load.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Hey! Your PR title perf(opencode): reduce redundant summary and queue overhead doesn't follow conventional commit format.

Please update it to start with one of:

  • feat: or feat(scope): new feature
  • fix: or fix(scope): bug fix
  • docs: or docs(scope): documentation changes
  • chore: or chore(scope): maintenance tasks
  • refactor: or refactor(scope): code refactoring
  • test: or test(scope): adding or updating tests

Where scope is the package name (e.g., app, desktop, opencode).

See CONTRIBUTING.md for details.

@github-actions github-actions bot added the needs:compliance This means the issue will auto-close after 2 hours. label Apr 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

  • PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

The following comment was made by an LLM, it may be inaccurate:

Potential Related PR Found

PR #20303: refactor(opencode): optimize doom loop detection, summary debounce, parallel plugin events
#20303

This PR may be related because it also addresses summary debouncing/optimization in opencode, which overlaps with PR #21507's work on reducing redundant summary calls and processor cleanup. However, the exact relationship (whether closed, merged, or addressing different aspects) should be verified.

PR #19237: perf(opencode): reduce streaming latency and request overhead
#19237

This PR addresses similar hot-path performance concerns in opencode streaming, though focused on a different area.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window.

Feel free to open a new pull request that follows our guidelines.

@github-actions github-actions bot removed the needs:compliance This means the issue will auto-close after 2 hours. label Apr 8, 2026
@github-actions github-actions bot closed this Apr 8, 2026
fairyhunter13 added a commit to fairyhunter13/opencode that referenced this pull request Apr 8, 2026
Core cache optimizations:
- Move mindContext from dynamicSystem to stableSystem (500-2000+ tokens/turn
  cached at BP1 for sessions with SessionMind context)
- Split failureContext into stableFailures (prior turns, BP1 cached) and
  dynamicFailures (current turn only) using signature-based dedup
- Add markLargeToolResults() pre-pass: cache_control on tool-result content
  parts >7000 chars (~2000 tokens), Anthropic direct + OpenRouter Claude
- Fix stale parts reference bug in markLargeToolResults for multi-tool messages
- Add compressImages() async pre-pass via sharp (PR anomalyco#21371): 3-phase
  quality->dimension->fallback compression prevents 5MB API limit errors
- Session snapshot resets (resetFailureSnapshot/resetEnvDynamicSent) in cleanup
- prompt_async idle race condition fix: check new messages before loop break

Upstream PR cherry-picks:
- PR anomalyco#21535: deterministic queued message wrapping eliminates per-turn cache miss
- PR anomalyco#21492: tool evidence digest (evidence.ts) preserves context through compaction
- PR anomalyco#21507: session processor single-flight summary dedup improvements
- PR anomalyco#21528: prompt_async idle wakeup race condition fix
- PR anomalyco#21500: Levenshtein O(min(N,M)) space with Int32Array two-row algorithm

New tools (PR anomalyco#21399):
- ContextUsageTool (check_context_usage): real-time token/cache usage reporting
- NewSessionTool (new_session): TUI-only, abort + create new session
- TuiEvent.SessionNew bus event and app.tsx handler
- SDK types.gen.ts/sdk.gen.ts EventTuiSessionNew type

Test infrastructure:
- E2E cache tests (OPENCODE_E2E=1) verified 100% cache hit rate on T2+
- Unit tests for large-tool cache breakpoints (4 scenarios)
- Fix pre-existing lsp-deps.test.ts assertion bug (LspTool in make() not all())
- Add await to all ProviderTransform.message() call sites (now async)
fairyhunter13 added a commit to fairyhunter13/opencode that referenced this pull request Apr 8, 2026
…rchestrator, multi-credential, codebase indexer

Core Features:
- Session Mind with persistent memory across sessions
- Orchestrator + Worker subagent architecture
- Multi-credential OAuth with auto-refresh
- Codebase indexer and watcher connectors
- Footer status bar with live metrics

Cache & Prompt Optimizations:
- Move mindContext/failureContext to stable system prefix (BP1 cached)
- Large tool result cache_control breakpoints (>7000 chars)
- Deterministic message wrapping (PR anomalyco#21535)
- Tool evidence digest through compaction (PR anomalyco#21492)
- O(1) queue dequeue + single-flight summary (PR anomalyco#21507)
- Levenshtein O(min(N,M)) space optimization (PR anomalyco#21500)
- Three-phase image auto-compression (PR anomalyco#21371)
- ContextUsage and NewSession tools (PR anomalyco#21399)
- E2E cache integration tests with real Anthropic OAuth

Session snapshot resets prevent memory leaks on session delete.
fairyhunter13 added a commit to fairyhunter13/opencode that referenced this pull request Apr 8, 2026
…rchestrator, multi-credential, codebase indexer

Core Features:
- Session Mind with persistent memory across sessions
- Orchestrator + Worker subagent architecture
- Multi-credential OAuth with auto-refresh
- Codebase indexer and watcher connectors
- Footer status bar with live metrics

Cache & Prompt Optimizations:
- Move mindContext/failureContext to stable system prefix (BP1 cached)
- Large tool result cache_control breakpoints (>7000 chars)
- Deterministic message wrapping (PR anomalyco#21535)
- Tool evidence digest through compaction (PR anomalyco#21492)
- O(1) queue dequeue + single-flight summary (PR anomalyco#21507)
- Levenshtein O(min(N,M)) space optimization (PR anomalyco#21500)
- Three-phase image auto-compression (PR anomalyco#21371)
- ContextUsage and NewSession tools (PR anomalyco#21399)
- E2E cache integration tests with real Anthropic OAuth

Session snapshot resets prevent memory leaks on session delete.
fairyhunter13 pushed a commit to fairyhunter13/opencode that referenced this pull request Apr 8, 2026
…rchestrator, multi-credential, codebase indexer

Core Features:
- Session Mind with persistent memory across sessions
- Orchestrator + Worker subagent architecture
- Multi-credential OAuth with auto-refresh
- Codebase indexer and watcher connectors
- Footer status bar with live metrics

Cache & Prompt Optimizations:
- Move mindContext/failureContext to stable system prefix (BP1 cached)
- Large tool result cache_control breakpoints (>7000 chars)
- Deterministic message wrapping (PR anomalyco#21535)
- Tool evidence digest through compaction (PR anomalyco#21492)
- O(1) queue dequeue + single-flight summary (PR anomalyco#21507)
- Levenshtein O(min(N,M)) space optimization (PR anomalyco#21500)
- Three-phase image auto-compression (PR anomalyco#21371)
- ContextUsage and NewSession tools (PR anomalyco#21399)
- E2E cache integration tests with real Anthropic OAuth

Session snapshot resets prevent memory leaks on session delete.
fairyhunter13 added a commit to fairyhunter13/opencode that referenced this pull request Apr 9, 2026
Core cache optimizations:
- Move mindContext from dynamicSystem to stableSystem (500-2000+ tokens/turn
  cached at BP1 for sessions with SessionMind context)
- Split failureContext into stableFailures (prior turns, BP1 cached) and
  dynamicFailures (current turn only) using signature-based dedup
- Add markLargeToolResults() pre-pass: cache_control on tool-result content
  parts >7000 chars (~2000 tokens), Anthropic direct + OpenRouter Claude
- Fix stale parts reference bug in markLargeToolResults for multi-tool messages
- Add compressImages() async pre-pass via sharp (PR anomalyco#21371): 3-phase
  quality->dimension->fallback compression prevents 5MB API limit errors
- Session snapshot resets (resetFailureSnapshot/resetEnvDynamicSent) in cleanup
- prompt_async idle race condition fix: check new messages before loop break

Upstream PR cherry-picks:
- PR anomalyco#21535: deterministic queued message wrapping eliminates per-turn cache miss
- PR anomalyco#21492: tool evidence digest (evidence.ts) preserves context through compaction
- PR anomalyco#21507: session processor single-flight summary dedup improvements
- PR anomalyco#21528: prompt_async idle wakeup race condition fix
- PR anomalyco#21500: Levenshtein O(min(N,M)) space with Int32Array two-row algorithm

New tools (PR anomalyco#21399):
- ContextUsageTool (check_context_usage): real-time token/cache usage reporting
- NewSessionTool (new_session): TUI-only, abort + create new session
- TuiEvent.SessionNew bus event and app.tsx handler
- SDK types.gen.ts/sdk.gen.ts EventTuiSessionNew type

Test infrastructure:
- E2E cache tests (OPENCODE_E2E=1) verified 100% cache hit rate on T2+
- Unit tests for large-tool cache breakpoints (4 scenarios)
- Fix pre-existing lsp-deps.test.ts assertion bug (LspTool in make() not all())
- Add await to all ProviderTransform.message() call sites (now async)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant