Skip to content

Implement intelligent adaptive timeout system for robust test execution #71

@coderabbitai

Description

@coderabbitai

Problem

Fixed timeouts are too rigid - need smart timeout selection based on test complexity, environment, and build configuration.

Task

Create adaptive timeout system with progressive scaling and environment detection in crates/comenqd/src/daemon.rs.

Implementation Details

1. Smart timeout calculation system

Add at top of test module (around line 400):

#[cfg(test)]
mod smart_timeouts {
    use std::time::Duration;
    
    #[derive(Debug, Clone, Copy)]
    pub enum TestComplexity {
        Simple,      // Basic operations
        Moderate,    // Worker processing
        Complex,     // Error handling, retries
    }
    
    #[derive(Debug, Clone, Copy)]
    pub struct TimeoutConfig {
        base_seconds: u64,
        complexity: TestComplexity,
    }
    
    impl TimeoutConfig {
        pub const fn new(base_seconds: u64, complexity: TestComplexity) -> Self {
            Self { base_seconds, complexity }
        }
        
        pub fn calculate_timeout(&self) -> Duration {
            let mut timeout = self.base_seconds;
            
            // Complexity multiplier
            timeout *= match self.complexity {
                TestComplexity::Simple => 1,
                TestComplexity::Moderate => 2,
                TestComplexity::Complex => 3,
            };
            
            // Build type multiplier
            #[cfg(debug_assertions)]
            { timeout *= 2; } // Debug builds are slower
            
            // Coverage instrumentation multiplier  
            #[cfg(any(coverage, coverage_nightly))]
            { timeout *= 5; } // Coverage adds massive overhead
            
            // CI environment multiplier
            if std::env::var("CI").is_ok() {
                timeout *= 2; // CI environments can be resource-constrained
            }
            
            // Ensure minimum reasonable timeout
            timeout = timeout.max(10);
            
            // Cap maximum timeout to prevent indefinite hangs
            timeout = timeout.min(600); // 10 minutes max
            
            Duration::from_secs(timeout)
        }
        
        pub fn with_progressive_retry(&self) -> Vec<Duration> {
            let base = self.calculate_timeout();
            vec![
                base / 2,           // First attempt: 50% of calculated timeout
                base,               // Second attempt: full calculated timeout  
                base + base / 2,    // Final attempt: 150% of calculated timeout
            ]
        }
    }
    
    // Test-specific timeout configurations
    pub const DRAINED_NOTIFICATION: TimeoutConfig = TimeoutConfig::new(15, TestComplexity::Moderate);
    pub const WORKER_SUCCESS: TimeoutConfig = TimeoutConfig::new(10, TestComplexity::Moderate);  
    pub const WORKER_ERROR: TimeoutConfig = TimeoutConfig::new(15, TestComplexity::Complex);
}

2. Progressive timeout helper function

#[cfg(test)]
async fn timeout_with_retries<F, T>(
    config: smart_timeouts::TimeoutConfig,
    operation_name: &str,
    mut operation: F,
) -> Result<T, String>
where
    F: FnMut() -> std::pin::Pin<Box<dyn std::future::Future<Output = Result<T, String>> + Send>>,
{
    let timeouts = config.with_progressive_retry();
    
    for (attempt, timeout_duration) in timeouts.iter().enumerate() {
        let attempt_num = attempt + 1;
        println!("⏱️  Attempt {}/{} with {}s timeout for {}", 
                attempt_num, timeouts.len(), timeout_duration.as_secs(), operation_name);
        
        match tokio::time::timeout(*timeout_duration, operation()).await {
            Ok(Ok(result)) => {
                println!("✅ {} completed successfully on attempt {}", operation_name, attempt_num);
                return Ok(result);
            }
            Ok(Err(e)) => {
                eprintln!("❌ {} failed on attempt {}: {}", operation_name, attempt_num, e);
                return Err(e);
            }
            Err(_) => {
                if attempt_num < timeouts.len() {
                    eprintln!("⏳ {} timed out after {}s, retrying...", operation_name, timeout_duration.as_secs());
                } else {
                    eprintln!("💥 {} failed after {} attempts, final timeout: {}s", 
                             operation_name, timeouts.len(), timeout_duration.as_secs());
                    return Err(format!("{} timed out after all retry attempts", operation_name));
                }
            }
        }
    }
    
    Err(format!("{} exhausted all retry attempts", operation_name))
}

3. Update existing timeout calls

Replace current timeout implementations with the new adaptive system:

  • Drained notification timeouts
  • Worker join timeouts
  • Other test operation timeouts

Smart Features

  • 🧠 Multi-dimensional scaling: Considers test complexity, build type, coverage, CI environment
  • 🔄 Progressive retries: Starts with shorter timeout, extends on failure
  • 📊 Adaptive calculation: Automatically adjusts based on detected conditions
  • 🎯 Test-specific: Different timeout strategies for different operations
  • 🛡️ Bounded: Minimum and maximum timeout limits prevent extremes
  • 📝 Verbose logging: Shows timeout reasoning and retry attempts
  • 🏗️ Future-proof: Easy to add new timeout factors

Rationale

  • Base timeout × complexity × build × coverage × CI environment = smart timeout
  • Progressive retries avoid false failures from temporary slowdowns
  • Verbose logging helps diagnose timeout issues
  • Bounded timeouts prevent indefinite waits

Acceptance Criteria

  • Smart timeout calculation system implemented
  • Progressive retry helper function added
  • All test timeouts updated to use new system
  • Tests pass in all configurations: release, debug, coverage, CI
  • Verbose logging shows timeout reasoning and attempts
  • Documentation updated to explain new timeout system

Verification

Test with different configurations to verify adaptive behavior:

  • cargo test (debug)
  • cargo test --release (release)
  • CI=1 cargo test (CI environment)
  • cargo llvm-cov test (coverage)

References


This enhancement will provide a robust, intelligent timeout system that adapts to various testing conditions while providing excellent observability and debugging capabilities.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions