-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Description
Summary
When a model attempts to generate a large tool input (e.g., writing a full page of content), the stream can stall and trigger a StreamIdleTimeoutError. This error is marked as retryable, causing an infinite loop where the model repeatedly attempts the same failing operation with exponential backoff delays.
User Experience: The UI shows "Preparing write..." indefinitely, with the agent stuck in a loop. The only way to exit is to manually abort (Escape key).
Detailed Timeline from Real Session
Session ses_3d5454748ffeA0QZlbOzaY4s4q on project VibeBrowserProductPage:
| Time | Event | Delay Since Last |
|---|---|---|
| 02:50:05 | Stream started (claude-opus-4.5) | - |
| 02:51:10 | StreamIdleTimeoutError (60s timeout) | 65s |
| 02:51:12 | Retry #1 started | 2s backoff |
| 02:52:16 | StreamIdleTimeoutError | 64s |
| 02:52:20 | Retry #2 started | 4s backoff |
| 02:53:23 | StreamIdleTimeoutError | 63s |
| 02:53:31 | Retry #3 started | 8s backoff |
| 02:54:34 | StreamIdleTimeoutError | 63s |
| 02:54:50 | Retry #4 started | 16s backoff |
| 02:55:53 | StreamIdleTimeoutError | 63s |
| 02:56:23 | Retry #5 started | 30s backoff |
| 02:57:27 | StreamIdleTimeoutError | 64s |
| 02:57:52 | User manually aborted | - |
Total time stuck: ~8 minutes before user intervention.
Raw Log Evidence
StreamIdleTimeoutError Sequence
ERROR 2026-02-05T02:51:10 +60107ms service=session.processor error=Stream idle timeout: no data received for 60000ms stack="StreamIdleTimeoutError: Stream idle timeout: no data received for 60000ms\n at <anonymous> (src/session/processor.ts:44:20)" process
ERROR 2026-02-05T02:52:16 +60096ms service=session.processor error=Stream idle timeout: no data received for 60000ms stack="StreamIdleTimeoutError: Stream idle timeout: no data received for 60000ms\n at <anonymous> (src/session/processor.ts:44:20)" process
ERROR 2026-02-05T02:53:23 +60218ms service=session.processor error=Stream idle timeout: no data received for 60000ms stack="StreamIdleTimeoutError: Stream idle timeout: no data received for 60000ms\n at <anonymous> (src/session/processor.ts:44:20)" process
ERROR 2026-02-05T02:54:34 +60211ms service=session.processor error=Stream idle timeout: no data received for 60000ms stack="StreamIdleTimeoutError: Stream idle timeout: no data received for 60000ms\n at <anonymous> (src/session/processor.ts:44:20)" process
ERROR 2026-02-05T02:55:53 +60214ms service=session.processor error=Stream idle timeout: no data received for 60000ms stack="StreamIdleTimeoutError: Stream idle timeout: no data received for 60000ms\n at <anonymous> (src/session/processor.ts:44:20)" process
ERROR 2026-02-05T02:57:27 +60133ms service=session.processor error=Stream idle timeout: no data received for 60000ms stack="StreamIdleTimeoutError: Stream idle timeout: no data received for 60000ms\n at <anonymous> (src/session/processor.ts:44:20)" process
Retry Pattern with Exponential Backoff
INFO 2026-02-05T02:50:05 service=llm modelID=claude-opus-4.5 sessionID=ses_3d5454748ffeA0QZlbOzaY4s4q stream
INFO 2026-02-05T02:51:12 +2002ms service=llm modelID=claude-opus-4.5 sessionID=ses_3d5454748ffeA0QZlbOzaY4s4q stream
INFO 2026-02-05T02:52:20 +4002ms service=llm modelID=claude-opus-4.5 sessionID=ses_3d5454748ffeA0QZlbOzaY4s4q stream
INFO 2026-02-05T02:53:31 +8003ms service=llm modelID=claude-opus-4.5 sessionID=ses_3d5454748ffeA0QZlbOzaY4s4q stream
INFO 2026-02-05T02:54:50 +16003ms service=llm modelID=claude-opus-4.5 sessionID=ses_3d5454748ffeA0QZlbOzaY4s4q stream
INFO 2026-02-05T02:56:23 +30002ms service=llm modelID=claude-opus-4.5 sessionID=ses_3d5454748ffeA0QZlbOzaY4s4q stream
Note the delays: 2s → 4s → 8s → 16s → 30s (capped at RETRY_MAX_DELAY_NO_HEADERS)
Task Verification Shows 10 Empty Write Attempts
From the reflection/task verification system:
## Tools Used
write: {}
write: {}
write: {}
write: {}
write: {}
write: {}
write: {}
write: {}
write: {}
write: {}
## Agent's Response
You're right, I was overthinking. Let me just write the full page:
The Write tool was called 10 times with empty input {} because the stream died during tool-input-start phase before the JSON input was fully parsed.
Root Cause Analysis
The Retry Loop Flow
User asks: "continue working on full product page"
↓
Model starts generating Write tool call with large content
↓
Provider API stalls (rate limit, internal processing, or output token exhaustion)
↓
60 seconds pass with no stream data chunks
↓
StreamIdleTimeoutError thrown (processor.ts:44)
↓
Error converted to APIError with isRetryable: true (message-v2.ts:715)
↓
retry.ts.retryable() returns message string (line 62-64)
↓
processor.ts catches error, increments attempt, waits with backoff (line 403-420)
↓
New LLM.stream() call starts from scratch
↓
Model sees previous failed attempt with "Tool execution aborted" error
↓
Model tries THE SAME approach again
↓
REPEAT FOREVER (or until user aborts)
Why Doom Loop Detection Doesn't Trigger
The existing doom loop detection in processor.ts:207-232 checks:
if (part.state.status === "running" && part.state.input) {
// Track same tool + same input called 3 times
}This fails because:
- Stream dies during
tool-input-startphase (beforetool-callevent) - Tool never reaches "running" status - it stays in "pending"
- Input is always
{}(empty) - JSON was never fully received - Cleanup marks tool as "error" with empty input
- Each retry has a different tool call ID
- Empty inputs
{}are not detected as "same input"
Code Path Evidence
message-v2.ts:711-720 - StreamIdleTimeoutError marked as retryable:
case e instanceof StreamIdleTimeoutError:
return new MessageV2.APIError(
{
message: e.message,
isRetryable: true, // <-- This causes infinite retries
metadata: {
timeoutMs: String(e.timeoutMs),
},
},
{ cause: e },
).toObject()processor.ts:403-420 - Retry logic with no max attempts:
} catch (e: any) {
log.error("process", { error: e, stack: JSON.stringify(e.stack) })
const error = MessageV2.fromError(e, { providerID: input.model.providerID })
const retry = SessionRetry.retryable(error)
if (retry !== undefined) {
attempt++
const delay = SessionRetry.delay(attempt, error.name === "APIError" ? error : undefined)
SessionStatus.set(input.sessionID, {
type: "retry",
attempt,
message: retry,
next: Date.now() + delay,
})
await SessionRetry.sleep(delay, input.abort).catch(() => {})
continue // <-- No max retry check for StreamIdleTimeoutError
}
// ...
}processor.ts:442-458 - Cleanup marks incomplete tools as aborted:
for (const part of p) {
if (part.type === "tool" && part.state.status !== "completed" && part.state.status !== "error") {
await Session.updatePart({
...part,
state: {
...part.state,
status: "error",
error: "Tool execution aborted", // <-- Generic message, no actionable guidance
// ...
},
})
}
}Environment
- Provider: github-copilot
- Model: claude-opus-4.5
- Stream idle timeout: 60000ms (default)
- Tool: write
- User task: "continue working on full product page, target financial sectors"
Suggested Fixes
Option 1: Add max retries for StreamIdleTimeoutError (Recommended)
// In processor.ts
let idleTimeoutRetries = 0
const MAX_IDLE_TIMEOUT_RETRIES = 3
// In catch block, before retry logic:
if (e instanceof StreamIdleTimeoutError) {
idleTimeoutRetries++
if (idleTimeoutRetries >= MAX_IDLE_TIMEOUT_RETRIES) {
input.assistantMessage.error = MessageV2.fromError(
new Error(`Stream repeatedly timed out (${MAX_IDLE_TIMEOUT_RETRIES} attempts). The model may be trying to generate content that exceeds output limits. Try breaking the task into smaller pieces.`),
{ providerID: input.model.providerID }
)
Bus.publish(Session.Event.Error, {
sessionID: input.assistantMessage.sessionID,
error: input.assistantMessage.error,
})
break // Exit the retry loop
}
}Option 2: Detect repeated incomplete tool calls
Track tools that fail during input generation (empty inputs):
// In processor.ts
const incompleteToolAttempts: Record<string, number> = {}
// In cleanup section (line 442-458):
for (const part of p) {
if (part.type === "tool" && part.state.status !== "completed" && part.state.status !== "error") {
// Track incomplete tool attempts
const inputSize = JSON.stringify(part.state.input || {}).length
if (inputSize <= 2) { // Empty object "{}"
incompleteToolAttempts[part.tool] = (incompleteToolAttempts[part.tool] || 0) + 1
if (incompleteToolAttempts[part.tool] >= DOOM_LOOP_THRESHOLD) {
blocked = true
// Add guidance to error message
}
}
// ... rest of cleanup
}
}Option 3: Better error message with actionable guidance
Instead of generic "Tool execution aborted":
error: `Tool execution aborted: stream timed out after ${timeoutMs/1000}s while generating tool input. This often happens when attempting to write very large content. Consider breaking the write operation into smaller chunks.`Option 4: Make StreamIdleTimeoutError non-retryable (simplest)
// In message-v2.ts:711-720
case e instanceof StreamIdleTimeoutError:
return new MessageV2.APIError(
{
message: e.message,
isRetryable: false, // <-- Stop automatic retries
metadata: {
timeoutMs: String(e.timeoutMs),
},
},
{ cause: e },
).toObject()This surfaces the error to the user immediately, who can then choose to retry or modify their request.
Related Files
packages/opencode/src/session/processor.ts- Stream processing, idle timeout, doom loop detection, cleanuppackages/opencode/src/session/message-v2.ts- StreamIdleTimeoutError class, error conversion, isRetryable flagpackages/opencode/src/session/retry.ts- Retry logic, backoff calculationpackages/opencode/src/session/prompt.ts- Main agentic loop
Additional Context
This issue can occur with any provider when:
- The model tries to generate very large tool inputs (like writing full files)
- The provider has internal rate limiting or processing delays
- The model hits output token limits during tool input generation
- Network issues cause intermittent stalls
The exponential backoff makes this particularly frustrating - after a few retries, the user is waiting 30+ seconds between each failed attempt, with no indication that the same error will keep occurring.