Skip to content

Conversation

@echobt
Copy link
Contributor

@echobt echobt commented Feb 4, 2026

Summary

Adds per-chunk timeout handling for streaming responses in the exec runner to prevent indefinite hangs when connections stall.

Problem

The exec runner wraps the entire execution in a timeout, but individual stream.next() calls can hang indefinitely within that window if a connection stalls. This can cause the agent to appear frozen without feedback.

Changes

  • Added STREAMING_CHUNK_TIMEOUT_SECS constant (30 seconds)
  • Wrapped stream event processing with tokio::time::timeout
  • Returns descriptive error when chunk timeout occurs

Testing

  • Verified code compiles with cargo check -p cortex-exec

Verification

cargo check -p cortex-exec

@greptile-apps
Copy link

greptile-apps bot commented Feb 4, 2026

Greptile Overview

Greptile Summary

Added per-chunk timeout handling for streaming LLM responses to prevent indefinite hangs when connections stall mid-stream.

Key Changes:

  • Introduced STREAMING_CHUNK_TIMEOUT_SECS constant (30 seconds) to define the maximum wait time per chunk
  • Wrapped stream.next() calls with tokio::time::timeout to enforce per-chunk timeout
  • Replaced while let Some(event) = stream.next().await with explicit loop and timeout handling
  • Returns descriptive CortexError::Provider error when chunk timeout occurs
  • Added warning log when timeout is triggered

Implementation Details:
The timeout hierarchy now operates at three levels:

  1. Overall execution timeout (10 minutes via DEFAULT_TIMEOUT_SECS)
  2. Single request timeout (2 minutes via DEFAULT_REQUEST_TIMEOUT_SECS)
  3. NEW: Per-chunk streaming timeout (30 seconds via STREAMING_CHUNK_TIMEOUT_SECS)

This change addresses scenarios where the underlying HTTP connection stalls mid-stream without closing, which could previously cause the agent to hang indefinitely even when within the broader timeout windows. The 30-second chunk timeout is appropriate for streaming responses, as it's long enough to accommodate network variability but short enough to provide timely feedback when connections fail.

Confidence Score: 5/5

  • This PR is safe to merge with no identified issues
  • The implementation is clean, well-documented, and solves a real problem. The timeout value is reasonable, error handling is appropriate with both logging and error propagation, and the change follows Rust best practices. The timeout hierarchy is logical and the 30-second per-chunk timeout fits well between the HTTP client's streaming timeout (5 minutes) and provides better granularity than relying solely on request-level timeouts.
  • No files require special attention

Important Files Changed

Filename Overview
src/cortex-exec/src/runner.rs Added per-chunk streaming timeout (30s) to prevent indefinite hangs when connections stall mid-stream

Sequence Diagram

sequenceDiagram
    participant Runner as ExecRunner
    participant Timeout as tokio::timeout
    participant Stream as LLM Stream
    participant LLM as LLM Provider

    Runner->>Runner: complete_streaming()
    Runner->>LLM: client.complete(request)
    LLM-->>Runner: stream
    
    loop For each chunk
        Runner->>Timeout: timeout(30s, stream.next())
        Timeout->>Stream: next()
        
        alt Chunk received within 30s
            Stream->>LLM: fetch next chunk
            LLM-->>Stream: chunk data
            Stream-->>Timeout: Some(event)
            Timeout-->>Runner: Ok(Some(event))
            Runner->>Runner: Process event (Delta/ToolCall/Done)
        else Stream ended normally
            Stream-->>Timeout: None
            Timeout-->>Runner: Ok(None)
            Runner->>Runner: break loop
        else Timeout after 30s (connection stalled)
            Timeout-->>Runner: Err(Elapsed)
            Runner->>Runner: tracing::warn()
            Runner-->>Runner: Err(CortexError::Provider)
        end
    end
    
    Runner-->>Runner: Return CompletionResponse
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

echobt added a commit that referenced this pull request Feb 4, 2026
This PR consolidates the following fixes:
- #74: Prevent shell injection in restore_script via path escaping
- #76: Replace unsafe unwrap() with expect() in init_client
- #78: Use secure random temp files in external editor to prevent symlink attacks
- #79: Add per-chunk streaming timeout to prevent indefinite hangs

Key changes:
- Added shell escaping for paths in shell-snapshot restore scripts
- Replaced unwrap() with expect() for better error context in exec runner
- Use secure random temp files instead of predictable names
- Added streaming chunk timeout to prevent hangs during LLM responses
@echobt
Copy link
Contributor Author

echobt commented Feb 4, 2026

Consolidated into #82 - fix: consolidated security and robustness improvements

@echobt echobt closed this Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant