Skip to content

fix: prevent prompt_async race condition on idle sessions#21528

Open
aadilshaikh123 wants to merge 1 commit intoanomalyco:devfrom
aadilshaikh123:bugfix/prompt-async-idle-wakeup
Open

fix: prevent prompt_async race condition on idle sessions#21528
aadilshaikh123 wants to merge 1 commit intoanomalyco:devfrom
aadilshaikh123:bugfix/prompt-async-idle-wakeup

Conversation

@aadilshaikh123
Copy link
Copy Markdown

Issue for this PR

Closes #21211

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Fix a race condition in prompt_async where messages sent to idle sessions are created but not reliably acted upon.

Problem: When prompt_async with noReply: false is called on an idle session, the message is stored in history but the session doesn't wake up to respond. This is critical for async communication systems (like relay-mesh multi-agent).

Root cause: The runner state machine can transition to Idle and delete itself from the runners Map while concurrently a new prompt_async message arrives and creates a new runner to process it. The new loop exits before detecting the newly added message.

Solution: Before exiting the session loop, check if new user messages exist beyond the last assistant message. If found, continue the loop to process them. This catches straggler messages that arrived during the transition window.

How did you verify your code works?

Traced the full execution path:

  • Analyzed runner state machine transitions between Idle/Running states
  • Identified the race condition window in onIdle cleanup vs getRunner() creation
  • Verified the fix detects messages created during loop transitions
  • Confirmed the fix handles multiple concurrent prompt_async calls correctly
  • Checked that normal loop exits still work (no infinite loops)

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

Add a check before exiting the session loop to detect if new messages arrived
while the loop was running. This handles a race condition where prompt_async
messages on idle sessions were created but not reliably acted upon.

The issue occurs when:
1. Session completes work and transitions to idle (runner cleanup start)
2. Concurrently, prompt_async with noReply: false creates a message and calls loop()
3. A new runner is created to process the message
4. But the loop exits before detecting the newly created message

The fix checks if new user messages exist beyond the last assistant message
before exiting the loop. If found, continues the loop to process them.

This ensures messages from prompt_async reliably trigger assistant responses
even when the session is idle, which is critical for async communication
patterns like relay-mesh multi-agent systems.

Related issues:
- Background agents ignore initial prompt - stuck until manually messaged
- Race condition between session cancel and Todo Continuation / Question dismiss
- TUI doesn't render messages from prompt_async endpoint
- /session/status not reporting properly after prompt_async
fairyhunter13 added a commit to fairyhunter13/opencode that referenced this pull request Apr 8, 2026
- Move mindContext from dynamicSystem to stableSystem so it is cached at BP1
  (saves 500-2000+ tokens/turn on sessions with SessionMind context)
- Split failureContext into stableFailures (prior turns, cached at BP1) and
  dynamicFailures (current turn only) to avoid re-sending stable failure history
- Use signature-based dedup (`tool:error_prefix`) so the formatted stable block
  never changes between turns, preventing cache invalidation on accumulation
- Add resetFailureSnapshot and resetEnvDynamicSent export functions for cleanup
- Preserve stableSystemCount accuracy after adding stableFailures to stableSystem
- Fix prompt_async idle race condition: before breaking the loop, check if a new
  user message arrived while we were running (PR anomalyco#21528)
fairyhunter13 added a commit to fairyhunter13/opencode that referenced this pull request Apr 8, 2026
Core cache optimizations:
- Move mindContext from dynamicSystem to stableSystem (500-2000+ tokens/turn
  cached at BP1 for sessions with SessionMind context)
- Split failureContext into stableFailures (prior turns, BP1 cached) and
  dynamicFailures (current turn only) using signature-based dedup
- Add markLargeToolResults() pre-pass: cache_control on tool-result content
  parts >7000 chars (~2000 tokens), Anthropic direct + OpenRouter Claude
- Fix stale parts reference bug in markLargeToolResults for multi-tool messages
- Add compressImages() async pre-pass via sharp (PR anomalyco#21371): 3-phase
  quality->dimension->fallback compression prevents 5MB API limit errors
- Session snapshot resets (resetFailureSnapshot/resetEnvDynamicSent) in cleanup
- prompt_async idle race condition fix: check new messages before loop break

Upstream PR cherry-picks:
- PR anomalyco#21535: deterministic queued message wrapping eliminates per-turn cache miss
- PR anomalyco#21492: tool evidence digest (evidence.ts) preserves context through compaction
- PR anomalyco#21507: session processor single-flight summary dedup improvements
- PR anomalyco#21528: prompt_async idle wakeup race condition fix
- PR anomalyco#21500: Levenshtein O(min(N,M)) space with Int32Array two-row algorithm

New tools (PR anomalyco#21399):
- ContextUsageTool (check_context_usage): real-time token/cache usage reporting
- NewSessionTool (new_session): TUI-only, abort + create new session
- TuiEvent.SessionNew bus event and app.tsx handler
- SDK types.gen.ts/sdk.gen.ts EventTuiSessionNew type

Test infrastructure:
- E2E cache tests (OPENCODE_E2E=1) verified 100% cache hit rate on T2+
- Unit tests for large-tool cache breakpoints (4 scenarios)
- Fix pre-existing lsp-deps.test.ts assertion bug (LspTool in make() not all())
- Add await to all ProviderTransform.message() call sites (now async)
fairyhunter13 added a commit to fairyhunter13/opencode that referenced this pull request Apr 9, 2026
Core cache optimizations:
- Move mindContext from dynamicSystem to stableSystem (500-2000+ tokens/turn
  cached at BP1 for sessions with SessionMind context)
- Split failureContext into stableFailures (prior turns, BP1 cached) and
  dynamicFailures (current turn only) using signature-based dedup
- Add markLargeToolResults() pre-pass: cache_control on tool-result content
  parts >7000 chars (~2000 tokens), Anthropic direct + OpenRouter Claude
- Fix stale parts reference bug in markLargeToolResults for multi-tool messages
- Add compressImages() async pre-pass via sharp (PR anomalyco#21371): 3-phase
  quality->dimension->fallback compression prevents 5MB API limit errors
- Session snapshot resets (resetFailureSnapshot/resetEnvDynamicSent) in cleanup
- prompt_async idle race condition fix: check new messages before loop break

Upstream PR cherry-picks:
- PR anomalyco#21535: deterministic queued message wrapping eliminates per-turn cache miss
- PR anomalyco#21492: tool evidence digest (evidence.ts) preserves context through compaction
- PR anomalyco#21507: session processor single-flight summary dedup improvements
- PR anomalyco#21528: prompt_async idle wakeup race condition fix
- PR anomalyco#21500: Levenshtein O(min(N,M)) space with Int32Array two-row algorithm

New tools (PR anomalyco#21399):
- ContextUsageTool (check_context_usage): real-time token/cache usage reporting
- NewSessionTool (new_session): TUI-only, abort + create new session
- TuiEvent.SessionNew bus event and app.tsx handler
- SDK types.gen.ts/sdk.gen.ts EventTuiSessionNew type

Test infrastructure:
- E2E cache tests (OPENCODE_E2E=1) verified 100% cache hit rate on T2+
- Unit tests for large-tool cache breakpoints (4 scenarios)
- Fix pre-existing lsp-deps.test.ts assertion bug (LspTool in make() not all())
- Add await to all ProviderTransform.message() call sites (now async)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Background agents ignore initial prompt - stuck until manually messaged

1 participant