fix(opencode): preserve tool context through compaction and prompt loops by GuestAUser · Pull Request #21492 · anomalyco/opencode

GuestAUser · 2026-04-08T09:34:05Z

Issue for this PR

Closes #20246

Type of change

Bug fix
New feature
Refactor / code improvement
Documentation

What does this PR do?

This PR fixes a cluster of session-context problems in packages/opencode/src/session that showed up in multi-step conversations, compaction follow-ups, and OpenAI Responses tool continuations.

At a high level, the branch now does seven things:

preserves bounded tool evidence through compaction so later loop iterations can still reference meaningful earlier tool results instead of only seeing a cleared placeholder
threads OpenAI Responses tool follow-ups with previous_response_id when store: true is explicitly enabled, so the continuation path can send the tool-result delta instead of replaying the full prior prompt history
memoizes the stable prompt-loop fragments from SystemPrompt.skills(agent) and SystemPrompt.environment(model) while still reloading instruction.system() every iteration so instruction edits remain visible on the very next loop step
narrows persisted step-finish metadata to the single safe field this branch actually consumes, openai.responseId, instead of persisting the full provider metadata object
hardens compacted-tool replay so legacy compacted tool parts without precomputed evidence metadata still degrade safely instead of throwing when they are converted back into model messages
detaches cached app project state from Solid store proxies before mirroring it into persisted global-sync project cache state, which addresses one Windows pageerror cascade during workspace/session/sidebar flows
clones project arrays before globalSync.set("project", ...) writes them back into the global store, so branch code no longer re-inserts proxy-backed project objects during app-side project updates

The result is a branch that is both more context-preserving and more operationally stable.

Problem breakdown

1. Compacted tool calls lost too much usable context

Before this change, once older completed tool calls were compacted, later continuation loops no longer had a durable, bounded summary of what the tool had actually done. That made follow-up reasoning weaker because the model only saw that old content had been cleared.

2. OpenAI Responses follow-ups replayed too much history

When OpenAI Responses tool execution produced a follow-up iteration, the session loop could still replay the full message history even when the provider already had a stored response chain. That is both slower and less precise than threading through previous_response_id with only the tool output delta.

3. Prompt-loop system fragments were recomputed unnecessarily

The loop was rebuilding stable prompt fragments each iteration even though only the dynamic instruction file contents actually needed to be reloaded each turn.

4. The branch needed stabilization after CI surfaced broad e2e failures

While validating the branch, CI for this PR reported broad app instability rather than one isolated selector failure:

Windows e2e produced many unrelated UI failures after repeated [e2e:pageerror] events and later fetch failed / ECONNRESET
Linux e2e timed out during startup while running bun script/e2e-local.ts

That pointed to branch-specific runtime instability in the opencode session pipeline rather than simple UI test flake. The two most suspicious paths on this branch were:

persisting full providerMetadata onto step-finish parts even though the branch only consumes metadata.openai.responseId
reading compacted evidence via state.metadata.evidence without a safe fallback for legacy compacted rows

This PR now fixes both of those stabilization problems directly.

File-by-file changes

`packages/opencode/src/session/compaction.ts`

compaction now preserves bounded tool evidence when older completed tool parts are pruned
for each compacted completed tool result, the branch stores a bounded digest built from:
- tool name
- title
- truncated serialized input
- short excerpt of output
- proof hash / byte count / line count
- optional output path
- optional attachment summary
this allows later prompt reconstruction to preserve useful context without replaying the full original tool output

`packages/opencode/src/session/evidence.ts`

adds the evidence helper used to construct compacted tool summaries
caps the amount of serialized tool context retained in compacted state
now safely handles missing metadata when reconstructing compacted evidence for older rows / legacy data

`packages/opencode/src/session/message-v2.ts`

when a completed tool part has been compacted, toModelMessages() now emits a structured evidence digest instead of the old [Old tool result content cleared] placeholder path
if state.metadata.evidence exists and matches the expected shape, it is used
if evidence metadata is missing, the digest is recomputed from the remaining completed tool state instead of throwing
this keeps compacted tool follow-ups semantically useful and backward-compatible

`packages/opencode/src/session/llm.ts`

adds opts to LLM.StreamInput
merges those opts into final provider options
this is the transport mechanism used by the prompt loop to inject previousResponseId / store for threaded OpenAI follow-ups

`packages/opencode/src/session/processor.ts`

stores step-finish.metadata so the prompt loop can reuse provider response IDs across iterations
stabilization follow-up: the persisted metadata is now intentionally narrowed to the safe allowlisted shape:
- { openai: { responseId } }
this avoids persisting arbitrary provider metadata blobs while preserving exactly what the branch needs for response threading

`packages/opencode/src/session/prompt.ts`

adds threaded() / chain() helpers for OpenAI Responses continuation detection
when the current model is OpenAI, store: true is enabled, the last assistant matches the current provider/model, and the last assistant contains completed tool activity, the loop extracts the responseId from the last step-finish metadata and uses it to thread the next request
in that threaded path, the loop sends only the tool-result delta instead of replaying all prior model messages
memoizes stable per-loop prompt fragments:
- SystemPrompt.skills(agent) per agent
- SystemPrompt.environment(model) per model
still reloads instruction.system() on every loop iteration so instruction changes remain live immediately

Branch stabilization follow-up

After the original branch changes were in place, I investigated the failing CI jobs linked from this PR.

Observed CI behavior

Windows e2e

The failure pattern was broad and systemic rather than selector-specific:

early failure in file tree expansion
later failures across workspace switching, workspace creation, session model persistence, status popovers, settings, sidebar navigation, and terminal tabs
repeated [e2e:pageerror] Object
later TypeError: fetch failed with ECONNRESET

That pattern suggested backend or session runtime instability rather than a single UI bug.

Linux e2e

The job timed out after 30 minutes inside Run app e2e tests.
The log showed the backend bootstrapping path starting, but the run stalled during startup and never completed within the job timeout.

Stabilization changes added because of that analysis

Two additional commits were added on top of the original branch work:

fix(opencode): store only safe finish response metadata
fix(opencode): fall back for legacy compacted tool evidence

These changes directly target the two branch-specific risky areas described above.

Additional app-side stabilization

A later Windows-only failure still pointed at app-side global sync rather than backend session serialization alone. The strongest signal was a Solid store proxy error from packages/app/src/context/global-sync.tsx during project cache writes.

Those follow-up fixes now ensure both sides of the project-sync path are detached: sanitizeProject() returns a detached plain copy for persisted cache writes, and setProjects() clones incoming project arrays before writing them back into the live global store. Together, that prevents proxy-backed project objects from one Solid store from being mirrored or reinserted across store boundaries and then surfacing as Symbol(solid-proxy) runtime failures in workspace/session/sidebar flows.

Commits on this PR

fix(opencode): preserve compacted tool evidence
fix(opencode): thread openai tool follow-ups
perf(opencode): memoize stable prompt loop context
fix(opencode): store only safe finish response metadata
fix(opencode): fall back for legacy compacted tool evidence
fix(app): detach cached project state from Solid proxies
fix(app): clone project state before syncing cache

How did you verify your code works?

I verified this locally in packages/opencode with both broad session regression coverage and focused stabilization checks.

Primary local validation:
- bun typecheck
- bun test test/session/message-v2.test.ts test/session/compaction.test.ts test/session/prompt-effect.test.ts test/session/processor-effect.test.ts
Focused stabilization validation:
- bun typecheck
- bun test test/session/message-v2.test.ts test/session/prompt-effect.test.ts test/session/processor-effect.test.ts
Specific regression probes:
- bun test test/session/message-v2.test.ts -t "replaces compacted tool output when legacy evidence metadata is missing"
- bun test test/session/prompt-effect.test.ts -t "openai tool continuation threads the previous response"
Manual QA:
- compacted-tool legacy fallback probe against MessageV2.toModelMessages(...), which produced output beginning with [Compacted tool result], tool: bash, title: Bash, input: {"cmd":"pwd"}
- direct Bun test run for the OpenAI response-threading regression, which passed locally
- direct sanitizeProject() / createStore() probe in packages/app confirming detached project copies for time, sandboxes, and icon state before storing them in a second store
Additional app validation:
- bun test --preload ./happydom.ts ./src/context/global-sync/utils.test.ts ./src/context/global-sync/event-reducer.test.ts
- direct cross-store cloneProject() / sanitizeProject() probe in packages/app confirming detached global-store and cache-store project copies

All targeted checks above passed locally.

Why these changes are safe

Response threading

The branch only consumes metadata.openai.responseId from step-finish, so narrowing persisted metadata to the allowlisted field removes risk without removing required behavior.

Prompt-loop memoization

Only stable prompt fragments are memoized. instruction.system() still reloads on every iteration, so AGENTS/instruction changes remain visible on the next loop step.

Compacted evidence fallback

The new fallback only applies when a tool part is already compacted and legacy evidence metadata is absent. In the normal compacted path, stored evidence is still preferred.

Existing semantics preserved

The branch remains scoped to session-context handling. It does not widen tool permissions, alter retry policy, or change non-session app behavior intentionally.

Trade-offs / things to watch

compacted evidence is intentionally lossy and bounded; it preserves proof and excerpt, not full historical output
OpenAI follow-up threading only activates when store: true is explicitly enabled and the prior assistant/model relationship is safe to reuse
persisted finish metadata is now intentionally narrow; if future logic needs more provider metadata, that should be added explicitly via allowlist rather than by storing the full provider payload
legacy compacted rows are now tolerated, but this fallback path should mainly be exercised for backward compatibility rather than as the primary compacted format

Scope note

This PR is scoped to packages/opencode/src/session/**, packages/app/src/context/global-sync/**, and the corresponding targeted regression coverage.
No unrelated console working tree changes are intended to be part of this PR.
This description intentionally keeps the repository PR-template headings verbatim so compliance automation can match them exactly.

Screenshots / recordings

N/A

Checklist

I have tested my changes locally
I have added / updated targeted regression coverage for the changed behavior
I investigated the branch-specific CI failures and added stabilization fixes for the identified risky code paths
I have not included unrelated changes in this PR

Memoize stable system prompt fragments across multi-step loop iterations so tool-call continuations stop rebuilding the same environment and skill text, while still reloading instruction files each step for correctness.

github-actions · 2026-04-08T14:01:42Z

Thanks for updating your PR! It now meets our contributing guidelines. 👍

Core cache optimizations: - Move mindContext from dynamicSystem to stableSystem (500-2000+ tokens/turn cached at BP1 for sessions with SessionMind context) - Split failureContext into stableFailures (prior turns, BP1 cached) and dynamicFailures (current turn only) using signature-based dedup - Add markLargeToolResults() pre-pass: cache_control on tool-result content parts >7000 chars (~2000 tokens), Anthropic direct + OpenRouter Claude - Fix stale parts reference bug in markLargeToolResults for multi-tool messages - Add compressImages() async pre-pass via sharp (PR anomalyco#21371): 3-phase quality->dimension->fallback compression prevents 5MB API limit errors - Session snapshot resets (resetFailureSnapshot/resetEnvDynamicSent) in cleanup - prompt_async idle race condition fix: check new messages before loop break Upstream PR cherry-picks: - PR anomalyco#21535: deterministic queued message wrapping eliminates per-turn cache miss - PR anomalyco#21492: tool evidence digest (evidence.ts) preserves context through compaction - PR anomalyco#21507: session processor single-flight summary dedup improvements - PR anomalyco#21528: prompt_async idle wakeup race condition fix - PR anomalyco#21500: Levenshtein O(min(N,M)) space with Int32Array two-row algorithm New tools (PR anomalyco#21399): - ContextUsageTool (check_context_usage): real-time token/cache usage reporting - NewSessionTool (new_session): TUI-only, abort + create new session - TuiEvent.SessionNew bus event and app.tsx handler - SDK types.gen.ts/sdk.gen.ts EventTuiSessionNew type Test infrastructure: - E2E cache tests (OPENCODE_E2E=1) verified 100% cache hit rate on T2+ - Unit tests for large-tool cache breakpoints (4 scenarios) - Fix pre-existing lsp-deps.test.ts assertion bug (LspTool in make() not all()) - Add await to all ProviderTransform.message() call sites (now async)

…rchestrator, multi-credential, codebase indexer Core Features: - Session Mind with persistent memory across sessions - Orchestrator + Worker subagent architecture - Multi-credential OAuth with auto-refresh - Codebase indexer and watcher connectors - Footer status bar with live metrics Cache & Prompt Optimizations: - Move mindContext/failureContext to stable system prefix (BP1 cached) - Large tool result cache_control breakpoints (>7000 chars) - Deterministic message wrapping (PR anomalyco#21535) - Tool evidence digest through compaction (PR anomalyco#21492) - O(1) queue dequeue + single-flight summary (PR anomalyco#21507) - Levenshtein O(min(N,M)) space optimization (PR anomalyco#21500) - Three-phase image auto-compression (PR anomalyco#21371) - ContextUsage and NewSession tools (PR anomalyco#21399) - E2E cache integration tests with real Anthropic OAuth Session snapshot resets prevent memory leaks on session delete.

Core cache optimizations: - Move mindContext from dynamicSystem to stableSystem (500-2000+ tokens/turn cached at BP1 for sessions with SessionMind context) - Split failureContext into stableFailures (prior turns, BP1 cached) and dynamicFailures (current turn only) using signature-based dedup - Add markLargeToolResults() pre-pass: cache_control on tool-result content parts >7000 chars (~2000 tokens), Anthropic direct + OpenRouter Claude - Fix stale parts reference bug in markLargeToolResults for multi-tool messages - Add compressImages() async pre-pass via sharp (PR anomalyco#21371): 3-phase quality->dimension->fallback compression prevents 5MB API limit errors - Session snapshot resets (resetFailureSnapshot/resetEnvDynamicSent) in cleanup - prompt_async idle race condition fix: check new messages before loop break Upstream PR cherry-picks: - PR anomalyco#21535: deterministic queued message wrapping eliminates per-turn cache miss - PR anomalyco#21492: tool evidence digest (evidence.ts) preserves context through compaction - PR anomalyco#21507: session processor single-flight summary dedup improvements - PR anomalyco#21528: prompt_async idle wakeup race condition fix - PR anomalyco#21500: Levenshtein O(min(N,M)) space with Int32Array two-row algorithm New tools (PR anomalyco#21399): - ContextUsageTool (check_context_usage): real-time token/cache usage reporting - NewSessionTool (new_session): TUI-only, abort + create new session - TuiEvent.SessionNew bus event and app.tsx handler - SDK types.gen.ts/sdk.gen.ts EventTuiSessionNew type Test infrastructure: - E2E cache tests (OPENCODE_E2E=1) verified 100% cache hit rate on T2+ - Unit tests for large-tool cache breakpoints (4 scenarios) - Fix pre-existing lsp-deps.test.ts assertion bug (LspTool in make() not all()) - Add await to all ProviderTransform.message() call sites (now async)

GuestAUser added 5 commits April 7, 2026 23:31

fix(opencode): preserve compacted tool evidence

b57f0eb

fix(opencode): thread openai tool follow-ups

5abde23

perf(opencode): memoize stable prompt loop context

882a567

Memoize stable system prompt fragments across multi-step loop iterations so tool-call continuations stop rebuilding the same environment and skill text, while still reloading instruction files each step for correctness.

fix(opencode): store only safe finish response metadata

39db8e9

fix(opencode): fall back for legacy compacted tool evidence

21a8c49

github-actions bot added needs:compliance This means the issue will auto-close after 2 hours. and removed needs:compliance This means the issue will auto-close after 2 hours. labels Apr 8, 2026

fix(app): detach cached project state from Solid proxies

dad51a1

GuestAUser requested a review from adamdotdevin as a code owner April 8, 2026 15:45

GuestAUser force-pushed the fix/tool-context-followups branch from 7ad70dd to dad51a1 Compare April 8, 2026 15:57

fix(app): clone project state before syncing cache

4d48756

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(opencode): preserve tool context through compaction and prompt loops#21492

fix(opencode): preserve tool context through compaction and prompt loops#21492
GuestAUser wants to merge 7 commits intoanomalyco:devfrom
GuestAUser:fix/tool-context-followups

GuestAUser commented Apr 8, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GuestAUser commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue for this PR

Type of change

What does this PR do?

Problem breakdown

1. Compacted tool calls lost too much usable context

2. OpenAI Responses follow-ups replayed too much history

3. Prompt-loop system fragments were recomputed unnecessarily

4. The branch needed stabilization after CI surfaced broad e2e failures

File-by-file changes

packages/opencode/src/session/compaction.ts

packages/opencode/src/session/evidence.ts

packages/opencode/src/session/message-v2.ts

packages/opencode/src/session/llm.ts

packages/opencode/src/session/processor.ts

packages/opencode/src/session/prompt.ts

Branch stabilization follow-up

Observed CI behavior

Windows e2e

Linux e2e

Stabilization changes added because of that analysis

Additional app-side stabilization

Commits on this PR

How did you verify your code works?

Why these changes are safe

Response threading

Prompt-loop memoization

Compacted evidence fallback

Existing semantics preserved

Trade-offs / things to watch

Scope note

Screenshots / recordings

Checklist

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

GuestAUser commented Apr 8, 2026 •

edited

Loading

`packages/opencode/src/session/compaction.ts`

`packages/opencode/src/session/evidence.ts`

`packages/opencode/src/session/message-v2.ts`

`packages/opencode/src/session/llm.ts`

`packages/opencode/src/session/processor.ts`

`packages/opencode/src/session/prompt.ts`