Skip to content

fix(opencode): preserve tool context through compaction and prompt loops#21492

Open
GuestAUser wants to merge 7 commits intoanomalyco:devfrom
GuestAUser:fix/tool-context-followups
Open

fix(opencode): preserve tool context through compaction and prompt loops#21492
GuestAUser wants to merge 7 commits intoanomalyco:devfrom
GuestAUser:fix/tool-context-followups

Conversation

@GuestAUser
Copy link
Copy Markdown

@GuestAUser GuestAUser commented Apr 8, 2026

Issue for this PR

Closes #20246

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

This PR fixes a cluster of session-context problems in packages/opencode/src/session that showed up in multi-step conversations, compaction follow-ups, and OpenAI Responses tool continuations.

At a high level, the branch now does seven things:

  1. preserves bounded tool evidence through compaction so later loop iterations can still reference meaningful earlier tool results instead of only seeing a cleared placeholder
  2. threads OpenAI Responses tool follow-ups with previous_response_id when store: true is explicitly enabled, so the continuation path can send the tool-result delta instead of replaying the full prior prompt history
  3. memoizes the stable prompt-loop fragments from SystemPrompt.skills(agent) and SystemPrompt.environment(model) while still reloading instruction.system() every iteration so instruction edits remain visible on the very next loop step
  4. narrows persisted step-finish metadata to the single safe field this branch actually consumes, openai.responseId, instead of persisting the full provider metadata object
  5. hardens compacted-tool replay so legacy compacted tool parts without precomputed evidence metadata still degrade safely instead of throwing when they are converted back into model messages
  6. detaches cached app project state from Solid store proxies before mirroring it into persisted global-sync project cache state, which addresses one Windows pageerror cascade during workspace/session/sidebar flows
  7. clones project arrays before globalSync.set("project", ...) writes them back into the global store, so branch code no longer re-inserts proxy-backed project objects during app-side project updates

The result is a branch that is both more context-preserving and more operationally stable.


Problem breakdown

1. Compacted tool calls lost too much usable context

Before this change, once older completed tool calls were compacted, later continuation loops no longer had a durable, bounded summary of what the tool had actually done. That made follow-up reasoning weaker because the model only saw that old content had been cleared.

2. OpenAI Responses follow-ups replayed too much history

When OpenAI Responses tool execution produced a follow-up iteration, the session loop could still replay the full message history even when the provider already had a stored response chain. That is both slower and less precise than threading through previous_response_id with only the tool output delta.

3. Prompt-loop system fragments were recomputed unnecessarily

The loop was rebuilding stable prompt fragments each iteration even though only the dynamic instruction file contents actually needed to be reloaded each turn.

4. The branch needed stabilization after CI surfaced broad e2e failures

While validating the branch, CI for this PR reported broad app instability rather than one isolated selector failure:

  • Windows e2e produced many unrelated UI failures after repeated [e2e:pageerror] events and later fetch failed / ECONNRESET
  • Linux e2e timed out during startup while running bun script/e2e-local.ts

That pointed to branch-specific runtime instability in the opencode session pipeline rather than simple UI test flake. The two most suspicious paths on this branch were:

  • persisting full providerMetadata onto step-finish parts even though the branch only consumes metadata.openai.responseId
  • reading compacted evidence via state.metadata.evidence without a safe fallback for legacy compacted rows

This PR now fixes both of those stabilization problems directly.


File-by-file changes

packages/opencode/src/session/compaction.ts

  • compaction now preserves bounded tool evidence when older completed tool parts are pruned
  • for each compacted completed tool result, the branch stores a bounded digest built from:
    • tool name
    • title
    • truncated serialized input
    • short excerpt of output
    • proof hash / byte count / line count
    • optional output path
    • optional attachment summary
  • this allows later prompt reconstruction to preserve useful context without replaying the full original tool output

packages/opencode/src/session/evidence.ts

  • adds the evidence helper used to construct compacted tool summaries
  • caps the amount of serialized tool context retained in compacted state
  • now safely handles missing metadata when reconstructing compacted evidence for older rows / legacy data

packages/opencode/src/session/message-v2.ts

  • when a completed tool part has been compacted, toModelMessages() now emits a structured evidence digest instead of the old [Old tool result content cleared] placeholder path
  • if state.metadata.evidence exists and matches the expected shape, it is used
  • if evidence metadata is missing, the digest is recomputed from the remaining completed tool state instead of throwing
  • this keeps compacted tool follow-ups semantically useful and backward-compatible

packages/opencode/src/session/llm.ts

  • adds opts to LLM.StreamInput
  • merges those opts into final provider options
  • this is the transport mechanism used by the prompt loop to inject previousResponseId / store for threaded OpenAI follow-ups

packages/opencode/src/session/processor.ts

  • stores step-finish.metadata so the prompt loop can reuse provider response IDs across iterations
  • stabilization follow-up: the persisted metadata is now intentionally narrowed to the safe allowlisted shape:
    • { openai: { responseId } }
  • this avoids persisting arbitrary provider metadata blobs while preserving exactly what the branch needs for response threading

packages/opencode/src/session/prompt.ts

  • adds threaded() / chain() helpers for OpenAI Responses continuation detection
  • when the current model is OpenAI, store: true is enabled, the last assistant matches the current provider/model, and the last assistant contains completed tool activity, the loop extracts the responseId from the last step-finish metadata and uses it to thread the next request
  • in that threaded path, the loop sends only the tool-result delta instead of replaying all prior model messages
  • memoizes stable per-loop prompt fragments:
    • SystemPrompt.skills(agent) per agent
    • SystemPrompt.environment(model) per model
  • still reloads instruction.system() on every loop iteration so instruction changes remain live immediately

Branch stabilization follow-up

After the original branch changes were in place, I investigated the failing CI jobs linked from this PR.

Observed CI behavior

Windows e2e

The failure pattern was broad and systemic rather than selector-specific:

  • early failure in file tree expansion
  • later failures across workspace switching, workspace creation, session model persistence, status popovers, settings, sidebar navigation, and terminal tabs
  • repeated [e2e:pageerror] Object
  • later TypeError: fetch failed with ECONNRESET

That pattern suggested backend or session runtime instability rather than a single UI bug.

Linux e2e

The job timed out after 30 minutes inside Run app e2e tests.
The log showed the backend bootstrapping path starting, but the run stalled during startup and never completed within the job timeout.

Stabilization changes added because of that analysis

Two additional commits were added on top of the original branch work:

  • fix(opencode): store only safe finish response metadata
  • fix(opencode): fall back for legacy compacted tool evidence

These changes directly target the two branch-specific risky areas described above.

Additional app-side stabilization

A later Windows-only failure still pointed at app-side global sync rather than backend session serialization alone. The strongest signal was a Solid store proxy error from packages/app/src/context/global-sync.tsx during project cache writes.

Those follow-up fixes now ensure both sides of the project-sync path are detached: sanitizeProject() returns a detached plain copy for persisted cache writes, and setProjects() clones incoming project arrays before writing them back into the live global store. Together, that prevents proxy-backed project objects from one Solid store from being mirrored or reinserted across store boundaries and then surfacing as Symbol(solid-proxy) runtime failures in workspace/session/sidebar flows.


Commits on this PR

  1. fix(opencode): preserve compacted tool evidence
  2. fix(opencode): thread openai tool follow-ups
  3. perf(opencode): memoize stable prompt loop context
  4. fix(opencode): store only safe finish response metadata
  5. fix(opencode): fall back for legacy compacted tool evidence
  6. fix(app): detach cached project state from Solid proxies
  7. fix(app): clone project state before syncing cache

How did you verify your code works?

I verified this locally in packages/opencode with both broad session regression coverage and focused stabilization checks.

  • Primary local validation:
    • bun typecheck
    • bun test test/session/message-v2.test.ts test/session/compaction.test.ts test/session/prompt-effect.test.ts test/session/processor-effect.test.ts
  • Focused stabilization validation:
    • bun typecheck
    • bun test test/session/message-v2.test.ts test/session/prompt-effect.test.ts test/session/processor-effect.test.ts
  • Specific regression probes:
    • bun test test/session/message-v2.test.ts -t "replaces compacted tool output when legacy evidence metadata is missing"
    • bun test test/session/prompt-effect.test.ts -t "openai tool continuation threads the previous response"
  • Manual QA:
    • compacted-tool legacy fallback probe against MessageV2.toModelMessages(...), which produced output beginning with [Compacted tool result], tool: bash, title: Bash, input: {"cmd":"pwd"}
    • direct Bun test run for the OpenAI response-threading regression, which passed locally
    • direct sanitizeProject() / createStore() probe in packages/app confirming detached project copies for time, sandboxes, and icon state before storing them in a second store
  • Additional app validation:
    • bun test --preload ./happydom.ts ./src/context/global-sync/utils.test.ts ./src/context/global-sync/event-reducer.test.ts
    • direct cross-store cloneProject() / sanitizeProject() probe in packages/app confirming detached global-store and cache-store project copies

All targeted checks above passed locally.


Why these changes are safe

Response threading

The branch only consumes metadata.openai.responseId from step-finish, so narrowing persisted metadata to the allowlisted field removes risk without removing required behavior.

Prompt-loop memoization

Only stable prompt fragments are memoized. instruction.system() still reloads on every iteration, so AGENTS/instruction changes remain visible on the next loop step.

Compacted evidence fallback

The new fallback only applies when a tool part is already compacted and legacy evidence metadata is absent. In the normal compacted path, stored evidence is still preferred.

Existing semantics preserved

The branch remains scoped to session-context handling. It does not widen tool permissions, alter retry policy, or change non-session app behavior intentionally.


Trade-offs / things to watch

  • compacted evidence is intentionally lossy and bounded; it preserves proof and excerpt, not full historical output
  • OpenAI follow-up threading only activates when store: true is explicitly enabled and the prior assistant/model relationship is safe to reuse
  • persisted finish metadata is now intentionally narrow; if future logic needs more provider metadata, that should be added explicitly via allowlist rather than by storing the full provider payload
  • legacy compacted rows are now tolerated, but this fallback path should mainly be exercised for backward compatibility rather than as the primary compacted format

Scope note

This PR is scoped to packages/opencode/src/session/**, packages/app/src/context/global-sync/**, and the corresponding targeted regression coverage.
No unrelated console working tree changes are intended to be part of this PR.
This description intentionally keeps the repository PR-template headings verbatim so compliance automation can match them exactly.

Screenshots / recordings

N/A

Checklist

  • I have tested my changes locally
  • I have added / updated targeted regression coverage for the changed behavior
  • I investigated the branch-specific CI failures and added stabilization fixes for the identified risky code paths
  • I have not included unrelated changes in this PR

Memoize stable system prompt fragments across multi-step loop iterations so tool-call continuations stop rebuilding the same environment and skill text, while still reloading instruction files each step for correctness.
@github-actions github-actions bot added needs:compliance This means the issue will auto-close after 2 hours. and removed needs:compliance This means the issue will auto-close after 2 hours. labels Apr 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Thanks for updating your PR! It now meets our contributing guidelines. 👍

@GuestAUser GuestAUser requested a review from adamdotdevin as a code owner April 8, 2026 15:45
@GuestAUser GuestAUser force-pushed the fix/tool-context-followups branch from 7ad70dd to dad51a1 Compare April 8, 2026 15:57
fairyhunter13 added a commit to fairyhunter13/opencode that referenced this pull request Apr 8, 2026
Core cache optimizations:
- Move mindContext from dynamicSystem to stableSystem (500-2000+ tokens/turn
  cached at BP1 for sessions with SessionMind context)
- Split failureContext into stableFailures (prior turns, BP1 cached) and
  dynamicFailures (current turn only) using signature-based dedup
- Add markLargeToolResults() pre-pass: cache_control on tool-result content
  parts >7000 chars (~2000 tokens), Anthropic direct + OpenRouter Claude
- Fix stale parts reference bug in markLargeToolResults for multi-tool messages
- Add compressImages() async pre-pass via sharp (PR anomalyco#21371): 3-phase
  quality->dimension->fallback compression prevents 5MB API limit errors
- Session snapshot resets (resetFailureSnapshot/resetEnvDynamicSent) in cleanup
- prompt_async idle race condition fix: check new messages before loop break

Upstream PR cherry-picks:
- PR anomalyco#21535: deterministic queued message wrapping eliminates per-turn cache miss
- PR anomalyco#21492: tool evidence digest (evidence.ts) preserves context through compaction
- PR anomalyco#21507: session processor single-flight summary dedup improvements
- PR anomalyco#21528: prompt_async idle wakeup race condition fix
- PR anomalyco#21500: Levenshtein O(min(N,M)) space with Int32Array two-row algorithm

New tools (PR anomalyco#21399):
- ContextUsageTool (check_context_usage): real-time token/cache usage reporting
- NewSessionTool (new_session): TUI-only, abort + create new session
- TuiEvent.SessionNew bus event and app.tsx handler
- SDK types.gen.ts/sdk.gen.ts EventTuiSessionNew type

Test infrastructure:
- E2E cache tests (OPENCODE_E2E=1) verified 100% cache hit rate on T2+
- Unit tests for large-tool cache breakpoints (4 scenarios)
- Fix pre-existing lsp-deps.test.ts assertion bug (LspTool in make() not all())
- Add await to all ProviderTransform.message() call sites (now async)
fairyhunter13 added a commit to fairyhunter13/opencode that referenced this pull request Apr 8, 2026
…rchestrator, multi-credential, codebase indexer

Core Features:
- Session Mind with persistent memory across sessions
- Orchestrator + Worker subagent architecture
- Multi-credential OAuth with auto-refresh
- Codebase indexer and watcher connectors
- Footer status bar with live metrics

Cache & Prompt Optimizations:
- Move mindContext/failureContext to stable system prefix (BP1 cached)
- Large tool result cache_control breakpoints (>7000 chars)
- Deterministic message wrapping (PR anomalyco#21535)
- Tool evidence digest through compaction (PR anomalyco#21492)
- O(1) queue dequeue + single-flight summary (PR anomalyco#21507)
- Levenshtein O(min(N,M)) space optimization (PR anomalyco#21500)
- Three-phase image auto-compression (PR anomalyco#21371)
- ContextUsage and NewSession tools (PR anomalyco#21399)
- E2E cache integration tests with real Anthropic OAuth

Session snapshot resets prevent memory leaks on session delete.
fairyhunter13 added a commit to fairyhunter13/opencode that referenced this pull request Apr 8, 2026
…rchestrator, multi-credential, codebase indexer

Core Features:
- Session Mind with persistent memory across sessions
- Orchestrator + Worker subagent architecture
- Multi-credential OAuth with auto-refresh
- Codebase indexer and watcher connectors
- Footer status bar with live metrics

Cache & Prompt Optimizations:
- Move mindContext/failureContext to stable system prefix (BP1 cached)
- Large tool result cache_control breakpoints (>7000 chars)
- Deterministic message wrapping (PR anomalyco#21535)
- Tool evidence digest through compaction (PR anomalyco#21492)
- O(1) queue dequeue + single-flight summary (PR anomalyco#21507)
- Levenshtein O(min(N,M)) space optimization (PR anomalyco#21500)
- Three-phase image auto-compression (PR anomalyco#21371)
- ContextUsage and NewSession tools (PR anomalyco#21399)
- E2E cache integration tests with real Anthropic OAuth

Session snapshot resets prevent memory leaks on session delete.
fairyhunter13 pushed a commit to fairyhunter13/opencode that referenced this pull request Apr 8, 2026
…rchestrator, multi-credential, codebase indexer

Core Features:
- Session Mind with persistent memory across sessions
- Orchestrator + Worker subagent architecture
- Multi-credential OAuth with auto-refresh
- Codebase indexer and watcher connectors
- Footer status bar with live metrics

Cache & Prompt Optimizations:
- Move mindContext/failureContext to stable system prefix (BP1 cached)
- Large tool result cache_control breakpoints (>7000 chars)
- Deterministic message wrapping (PR anomalyco#21535)
- Tool evidence digest through compaction (PR anomalyco#21492)
- O(1) queue dequeue + single-flight summary (PR anomalyco#21507)
- Levenshtein O(min(N,M)) space optimization (PR anomalyco#21500)
- Three-phase image auto-compression (PR anomalyco#21371)
- ContextUsage and NewSession tools (PR anomalyco#21399)
- E2E cache integration tests with real Anthropic OAuth

Session snapshot resets prevent memory leaks on session delete.
fairyhunter13 added a commit to fairyhunter13/opencode that referenced this pull request Apr 9, 2026
Core cache optimizations:
- Move mindContext from dynamicSystem to stableSystem (500-2000+ tokens/turn
  cached at BP1 for sessions with SessionMind context)
- Split failureContext into stableFailures (prior turns, BP1 cached) and
  dynamicFailures (current turn only) using signature-based dedup
- Add markLargeToolResults() pre-pass: cache_control on tool-result content
  parts >7000 chars (~2000 tokens), Anthropic direct + OpenRouter Claude
- Fix stale parts reference bug in markLargeToolResults for multi-tool messages
- Add compressImages() async pre-pass via sharp (PR anomalyco#21371): 3-phase
  quality->dimension->fallback compression prevents 5MB API limit errors
- Session snapshot resets (resetFailureSnapshot/resetEnvDynamicSent) in cleanup
- prompt_async idle race condition fix: check new messages before loop break

Upstream PR cherry-picks:
- PR anomalyco#21535: deterministic queued message wrapping eliminates per-turn cache miss
- PR anomalyco#21492: tool evidence digest (evidence.ts) preserves context through compaction
- PR anomalyco#21507: session processor single-flight summary dedup improvements
- PR anomalyco#21528: prompt_async idle wakeup race condition fix
- PR anomalyco#21500: Levenshtein O(min(N,M)) space with Int32Array two-row algorithm

New tools (PR anomalyco#21399):
- ContextUsageTool (check_context_usage): real-time token/cache usage reporting
- NewSessionTool (new_session): TUI-only, abort + create new session
- TuiEvent.SessionNew bus event and app.tsx handler
- SDK types.gen.ts/sdk.gen.ts EventTuiSessionNew type

Test infrastructure:
- E2E cache tests (OPENCODE_E2E=1) verified 100% cache hit rate on T2+
- Unit tests for large-tool cache breakpoints (4 scenarios)
- Fix pre-existing lsp-deps.test.ts assertion bug (LspTool in make() not all())
- Add await to all ProviderTransform.message() call sites (now async)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Overflow compaction produces incomplete summaries in tool-heavy sessions

1 participant