fix: prevent prompt_async race condition on idle sessions#21528
Open
aadilshaikh123 wants to merge 1 commit intoanomalyco:devfrom
Open
fix: prevent prompt_async race condition on idle sessions#21528aadilshaikh123 wants to merge 1 commit intoanomalyco:devfrom
aadilshaikh123 wants to merge 1 commit intoanomalyco:devfrom
Conversation
Add a check before exiting the session loop to detect if new messages arrived while the loop was running. This handles a race condition where prompt_async messages on idle sessions were created but not reliably acted upon. The issue occurs when: 1. Session completes work and transitions to idle (runner cleanup start) 2. Concurrently, prompt_async with noReply: false creates a message and calls loop() 3. A new runner is created to process the message 4. But the loop exits before detecting the newly created message The fix checks if new user messages exist beyond the last assistant message before exiting the loop. If found, continues the loop to process them. This ensures messages from prompt_async reliably trigger assistant responses even when the session is idle, which is critical for async communication patterns like relay-mesh multi-agent systems. Related issues: - Background agents ignore initial prompt - stuck until manually messaged - Race condition between session cancel and Todo Continuation / Question dismiss - TUI doesn't render messages from prompt_async endpoint - /session/status not reporting properly after prompt_async
fairyhunter13
added a commit
to fairyhunter13/opencode
that referenced
this pull request
Apr 8, 2026
- Move mindContext from dynamicSystem to stableSystem so it is cached at BP1 (saves 500-2000+ tokens/turn on sessions with SessionMind context) - Split failureContext into stableFailures (prior turns, cached at BP1) and dynamicFailures (current turn only) to avoid re-sending stable failure history - Use signature-based dedup (`tool:error_prefix`) so the formatted stable block never changes between turns, preventing cache invalidation on accumulation - Add resetFailureSnapshot and resetEnvDynamicSent export functions for cleanup - Preserve stableSystemCount accuracy after adding stableFailures to stableSystem - Fix prompt_async idle race condition: before breaking the loop, check if a new user message arrived while we were running (PR anomalyco#21528)
fairyhunter13
added a commit
to fairyhunter13/opencode
that referenced
this pull request
Apr 8, 2026
Core cache optimizations: - Move mindContext from dynamicSystem to stableSystem (500-2000+ tokens/turn cached at BP1 for sessions with SessionMind context) - Split failureContext into stableFailures (prior turns, BP1 cached) and dynamicFailures (current turn only) using signature-based dedup - Add markLargeToolResults() pre-pass: cache_control on tool-result content parts >7000 chars (~2000 tokens), Anthropic direct + OpenRouter Claude - Fix stale parts reference bug in markLargeToolResults for multi-tool messages - Add compressImages() async pre-pass via sharp (PR anomalyco#21371): 3-phase quality->dimension->fallback compression prevents 5MB API limit errors - Session snapshot resets (resetFailureSnapshot/resetEnvDynamicSent) in cleanup - prompt_async idle race condition fix: check new messages before loop break Upstream PR cherry-picks: - PR anomalyco#21535: deterministic queued message wrapping eliminates per-turn cache miss - PR anomalyco#21492: tool evidence digest (evidence.ts) preserves context through compaction - PR anomalyco#21507: session processor single-flight summary dedup improvements - PR anomalyco#21528: prompt_async idle wakeup race condition fix - PR anomalyco#21500: Levenshtein O(min(N,M)) space with Int32Array two-row algorithm New tools (PR anomalyco#21399): - ContextUsageTool (check_context_usage): real-time token/cache usage reporting - NewSessionTool (new_session): TUI-only, abort + create new session - TuiEvent.SessionNew bus event and app.tsx handler - SDK types.gen.ts/sdk.gen.ts EventTuiSessionNew type Test infrastructure: - E2E cache tests (OPENCODE_E2E=1) verified 100% cache hit rate on T2+ - Unit tests for large-tool cache breakpoints (4 scenarios) - Fix pre-existing lsp-deps.test.ts assertion bug (LspTool in make() not all()) - Add await to all ProviderTransform.message() call sites (now async)
fairyhunter13
added a commit
to fairyhunter13/opencode
that referenced
this pull request
Apr 9, 2026
Core cache optimizations: - Move mindContext from dynamicSystem to stableSystem (500-2000+ tokens/turn cached at BP1 for sessions with SessionMind context) - Split failureContext into stableFailures (prior turns, BP1 cached) and dynamicFailures (current turn only) using signature-based dedup - Add markLargeToolResults() pre-pass: cache_control on tool-result content parts >7000 chars (~2000 tokens), Anthropic direct + OpenRouter Claude - Fix stale parts reference bug in markLargeToolResults for multi-tool messages - Add compressImages() async pre-pass via sharp (PR anomalyco#21371): 3-phase quality->dimension->fallback compression prevents 5MB API limit errors - Session snapshot resets (resetFailureSnapshot/resetEnvDynamicSent) in cleanup - prompt_async idle race condition fix: check new messages before loop break Upstream PR cherry-picks: - PR anomalyco#21535: deterministic queued message wrapping eliminates per-turn cache miss - PR anomalyco#21492: tool evidence digest (evidence.ts) preserves context through compaction - PR anomalyco#21507: session processor single-flight summary dedup improvements - PR anomalyco#21528: prompt_async idle wakeup race condition fix - PR anomalyco#21500: Levenshtein O(min(N,M)) space with Int32Array two-row algorithm New tools (PR anomalyco#21399): - ContextUsageTool (check_context_usage): real-time token/cache usage reporting - NewSessionTool (new_session): TUI-only, abort + create new session - TuiEvent.SessionNew bus event and app.tsx handler - SDK types.gen.ts/sdk.gen.ts EventTuiSessionNew type Test infrastructure: - E2E cache tests (OPENCODE_E2E=1) verified 100% cache hit rate on T2+ - Unit tests for large-tool cache breakpoints (4 scenarios) - Fix pre-existing lsp-deps.test.ts assertion bug (LspTool in make() not all()) - Add await to all ProviderTransform.message() call sites (now async)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue for this PR
Closes #21211
Type of change
What does this PR do?
Fix a race condition in prompt_async where messages sent to idle sessions are created but not reliably acted upon.
Problem: When prompt_async with
noReply: falseis called on an idle session, the message is stored in history but the session doesn't wake up to respond. This is critical for async communication systems (like relay-mesh multi-agent).Root cause: The runner state machine can transition to Idle and delete itself from the runners Map while concurrently a new prompt_async message arrives and creates a new runner to process it. The new loop exits before detecting the newly added message.
Solution: Before exiting the session loop, check if new user messages exist beyond the last assistant message. If found, continue the loop to process them. This catches straggler messages that arrived during the transition window.
How did you verify your code works?
Traced the full execution path:
Checklist