Skip to content

fix: Prevent UI freeze after long sessions (issues #173, #179)#193

Merged
kshivang merged 1 commit intomasterfrom
dev
Dec 28, 2025
Merged

fix: Prevent UI freeze after long sessions (issues #173, #179)#193
kshivang merged 1 commit intomasterfrom
dev

Conversation

@kshivang
Copy link
Copy Markdown
Owner

Summary

Fixes the terminal UI freeze issue that occurs after 2-3+ hours of running BossTerm (issues #173, #179). The UI would stop showing output while PTY continues working - commands execute but nothing displays, even in new tabs.

Root Causes Identified & Fixed

  1. Redraw processor can die silently (CRITICAL)

    • The redraw processor coroutine had no exception handling
    • Any exception would exit the loop silently, dropping all future redraws
    • Fix: Added try-catch with auto-restart capability
  2. ModelListener memory leak (HIGH)

    • Anonymous listeners added but never removed on tab close
    • Over hours of tab create/close cycles, listeners accumulate
    • References to disposed displays can throw exceptions
    • Fix: Store listener reference in TerminalTab, remove in dispose()
  3. PTY reader no exception handling (MEDIUM-HIGH)

    • IOException from PTY disconnect would crash reader silently
    • Fix: Added try-catch for graceful error handling
  4. GlobalScope resource leak (LOW)

    • Process kill used GlobalScope with no timeout
    • Fix: Use scoped CoroutineScope with 5-second timeout

Enhancement: Console Log Capture

  • Added CONSOLE_LOG source to debug system
  • Errors/warnings now visible in Debug Panel alongside PTY output
  • Makes diagnosing long-running session issues much easier

Files Changed

File Changes
ComposeTerminalDisplay.kt Exception handling, debug log callback
TabController.kt Listener management, PTY error handling, logging helpers
TerminalTab.kt modelListener field, dispose cleanup
DebugModels.kt CONSOLE_LOG enum value
DebugDataCollector.kt LOG# prefix for console entries

Test plan

  • Run BossTerm for extended period (4+ hours) with active use
  • Rapid tab create/close cycles (50+ tabs)
  • High-volume output test (large file cat, streaming)
  • Verify Debug Panel shows CONSOLE_LOG entries for any errors
  • Verify no memory growth over time via profiler

Closes #173, Closes #179

🤖 Generated with Claude Code

@claude
Copy link
Copy Markdown

claude Bot commented Dec 28, 2025

Code Review - PR #193: Fix UI Freeze After Long Sessions

This is an excellent bug fix PR that addresses critical stability issues in BossTerm. The root cause analysis is thorough, and the fixes are well-implemented. I have a few recommendations for further improvement.


✅ Strengths

1. Outstanding Root Cause Analysis

  • Identified 4 distinct failure modes with clear severity classification
  • The PR description provides excellent context for future maintainers
  • Each fix is directly tied to a specific failure scenario

2. Critical Redraw Processor Fix

The auto-restart mechanism in ComposeTerminalDisplay.kt is brilliant:

  • Catches individual redraw failures without killing the loop
  • Auto-restarts on fatal errors with backoff delay
  • Properly handles cancellation during shutdown
  • The 100ms delay prevents tight loops on persistent errors

3. Memory Leak Fix

The ModelListener cleanup addresses a classic Android/Compose pitfall:

  • Storing listener reference in TerminalTab enables proper cleanup
  • Consistent pattern applied across all 3 tab creation methods
  • Clear comments explain the criticality of this fix

4. Enhanced Observability

The CONSOLE_LOG source is a smart addition:

  • Makes debugging long-running sessions much easier
  • Consolidates error visibility (console + debug panel)
  • Consistent logging helpers (logError, logTabError)

🔍 Issues & Recommendations

1. Missing Resource Cleanup in startRedrawProcessor() ⚠️

Location: ComposeTerminalDisplay.kt:301-342

Issue: The auto-restart mechanism launches a new coroutine without canceling the previous one. Over time, this could accumulate multiple redraw processors.

Current Code:

private fun startRedrawProcessor() {
    redrawScope.launch {
        try {
            for (request in redrawChannel) { /* ... */ }
        } catch (e: Exception) {
            logError("ERROR: Redraw processor crashed, auto-restarting: ${e.message}", e)
            kotlinx.coroutines.delay(100)
            startRedrawProcessor()  // ⚠️ Recursive launch - previous coroutine not cleaned up
        }
    }
}

Recommended Fix:

// Store the redraw job reference
private var redrawJob: Job? = null

private fun startRedrawProcessor() {
    // Cancel existing job if restarting
    redrawJob?.cancel()
    
    redrawJob = redrawScope.launch {
        try {
            for (request in redrawChannel) { /* ... */ }
        } catch (e: kotlinx.coroutines.CancellationException) {
            throw e  // Already handled correctly
        } catch (e: Exception) {
            logError("ERROR: Redraw processor crashed, auto-restarting: ${e.message}", e)
            kotlinx.coroutines.delay(100)
            startRedrawProcessor()
        }
    }
}

Why: Without cancellation, each restart leaks a coroutine. While the old coroutine will eventually exit at the next channel receive (since the channel is still open), it's cleaner to explicitly cancel.


2. Potential Double-Close in PTY Reader ⚠️

Location: TabController.kt:986-1015, 1145-1181

Issue: dataStream.close() is called in the finally block, but if the coroutine is cancelled via coroutineScope.cancel() (in dispose()), the finally block may race with other cleanup.

Analysis:

  • BlockingTerminalDataStream likely handles double-close safely
  • However, the pattern is fragile if the stream implementation changes

Recommendation:

} catch (e: Exception) {
    logTabError(tab, "ERROR: PTY reader crashed", e)
} finally {
    // Use runCatching to make double-close safe
    kotlin.runCatching { tab.dataStream.close() }
}

Priority: Low (likely already safe, but defensive programming is good)


3. logTabError Called After Tab Disposal

Location: TabController.kt:1299-1311

Issue: The process kill coroutine captures tabForLogging to call logTabError after the tab is already removed from the list and partially disposed.

Current Code:

val tabForLogging = tab  // Capture reference for logging
CoroutineScope(Dispatchers.IO).launch {
    try {
        kotlinx.coroutines.withTimeout(5000) {
            processToKill.kill()
        }
    } catch (e: kotlinx.coroutines.TimeoutCancellationException) {
        logTabError(tabForLogging, "WARN: Process kill timed out")  // ⚠️ Tab already disposed
    }
}

Analysis:

  • tab.debugCollector is accessed in logTabError
  • At this point, tab.dispose() has already been called
  • While DebugDataCollector likely handles this safely, it's semantically incorrect

Recommended Fix:

// Capture debug collector reference before disposal
val debugCollector = tab.debugCollector

CoroutineScope(Dispatchers.IO).launch {
    try {
        kotlinx.coroutines.withTimeout(5000) {
            processToKill.kill()
        }
    } catch (e: kotlinx.coroutines.TimeoutCancellationException) {
        val message = "[...] WARN: Process kill timed out after 5 seconds"
        System.err.println(message)
        debugCollector?.recordChunk(message, ChunkSource.CONSOLE_LOG)
    } catch (e: Exception) {
        val message = "[...] WARN: Error killing process: ${e.message}"
        System.err.println(message)
        debugCollector?.recordChunk(message, ChunkSource.CONSOLE_LOG)
    }
}

Priority: Medium (potential for subtle bugs)


4. Missing @OptIn Removal Incomplete

Location: TabController.kt:1280, 1382

Issue: The PR removes @OptIn(DelicateCoroutinesApi::class) annotations, but the commit message doesn't explain why GlobalScope was considered "delicate" in the first place.

Analysis:

  • Good fix: Using CoroutineScope(Dispatchers.IO) instead of GlobalScope
  • However, the new scope is not attached to any lifecycle - it's effectively the same as GlobalScope
  • The coroutine could outlive the tab/controller lifecycle

Recommendation:
Consider using a dedicated cleanup scope that's cancelled when the controller is disposed:

class TabController(...) {
    private val cleanupScope = CoroutineScope(Dispatchers.IO + SupervisorJob())
    
    fun disposeAll() {
        // ... existing disposal logic ...
        
        // Cancel cleanup scope to abort any pending process kills
        cleanupScope.cancel()
    }
}

Then use cleanupScope.launch { ... } instead of CoroutineScope(Dispatchers.IO).launch { ... }.

Priority: Low (the 5-second timeout mitigates the issue)


5. Exception Handling in dispose() ✅ (Minor Nitpick)

Location: TerminalTab.kt:351-360

Code:

modelListener?.let {
    try {
        textBuffer.removeModelListener(it)
    } catch (e: Exception) {
        System.err.println("WARN: Failed to remove model listener: ${e.message}")
    }
}

Observation: Catching Exception is broad. Consider catching only expected exceptions (e.g., IllegalStateException, ConcurrentModificationException).

Priority: Very Low (defensive programming is fine here)


🧪 Testing Recommendations

1. Memory Leak Verification

Run this test to verify the ModelListener fix:

# Create/close 1000 tabs rapidly
for i in {1..1000}; do
    # Open tab (Ctrl+T) and immediately close (Ctrl+W)
done

# Check heap with visualvm or jconsole
# Heap should stabilize, not grow linearly

2. Redraw Processor Resilience

Inject failures to test auto-restart:

// Add debug flag to ComposeTerminalDisplay.kt
private val injectRedrawFailure = System.getenv("BOSSTERM_INJECT_REDRAW_FAILURE")?.toBoolean() ?: false

private fun actualRedraw() {
    if (injectRedrawFailure && Random.nextFloat() < 0.01) {
        throw RuntimeException("Injected redraw failure for testing")
    }
    // ... existing code ...
}

Run with BOSSTERM_INJECT_REDRAW_FAILURE=true and verify UI remains responsive.

3. PTY Error Handling

Simulate PTY disconnection:

# While BossTerm is running, kill the shell process
ps aux | grep bash | grep -v grep | awk '{print $2}' | xargs kill -9

# Verify Debug Panel shows "PTY read ended" message
# Verify tab closes gracefully

📊 Performance Considerations

Good: The 100ms restart delay prevents tight loops
Good: IOException is handled separately from general exceptions
Good: withTimeout(5000) prevents indefinite hangs on process kill

⚠️ Watch: The auto-restart mechanism could mask persistent issues. Consider adding a restart counter with circuit breaker:

private var restartCount = 0
private var lastRestartTime = 0L

private fun startRedrawProcessor() {
    val now = System.currentTimeMillis()
    if (now - lastRestartTime < 10000) {  // Within 10 seconds
        restartCount++
        if (restartCount > 5) {
            logError("ERROR: Redraw processor failed 5 times in 10s, giving up")
            return  // Circuit breaker
        }
    } else {
        restartCount = 0
    }
    lastRestartTime = now
    
    // ... existing code ...
}

🔒 Security Considerations

No security issues identified

  • Exception messages are logged, not user-controlled
  • No command injection or path traversal risks
  • Resource cleanup prevents DoS via resource exhaustion

📝 Documentation

Excellent inline comments explaining:

  • Why each fix is critical
  • What happens without the fix
  • Implementation rationale

Suggestion: Add a troubleshooting section to CLAUDE.md:

## Troubleshooting Long-Running Sessions

If BossTerm UI freezes after 2-3+ hours:
1. Check Debug Panel (Ctrl+Shift+D) for CONSOLE_LOG entries
2. Look for "Redraw processor crashed" messages
3. Check heap with jconsole/visualvm for memory leaks
4. Enable BOSSTERM_DEBUG_CURSOR=true for cursor debugging

🎯 Overall Assessment

Rating: ⭐⭐⭐⭐½ (4.5/5)

Strengths

  • Addresses real production issues with thorough analysis
  • Code quality is high with excellent comments
  • Observability improvements are valuable
  • Follows existing code patterns consistently

Areas for Improvement

Recommendation

Approve with minor revisions. The core fixes are sound, but Issues #1 and #3 should be addressed before merge to avoid subtle bugs in production.


🚀 Next Steps

  1. Fix redraw processor coroutine leak (Issue feat: Terminal rendering improvements + 4 high-priority features (#2, #3, #4, #5) #1)
  2. Fix post-disposal logging in process kill (Issue 🔗 Implement hyperlink detection and click handling #3)
  3. Add memory leak test to CI (if feasible)
  4. Update CLAUDE.md with troubleshooting section

Great work on this critical stability fix! 🎉

@claude
Copy link
Copy Markdown

claude Bot commented Dec 28, 2025

PR Review: Fix Terminal UI Freeze After Long Sessions

Overview

This PR addresses critical stability issues that cause the terminal UI to freeze after extended use (issues #173, #179). The changes are well-targeted and address real production issues discovered through debugging.


✅ Strengths

1. Critical Bug Fix: Redraw Processor Crash Protection

The auto-restart mechanism in ComposeTerminalDisplay.kt (lines 293-340) is excellent:

  • Wraps the redraw loop in exception handling
  • Distinguishes between normal cancellation and fatal errors
  • Auto-restarts with 100ms cooldown to prevent tight loops
  • Impact: Prevents permanent UI freeze when exceptions occur

2. Memory Leak Fix: ModelListener Cleanup

The listener management is properly implemented:

  • Stores listener reference in TerminalTab.kt (line 242)
  • Removes listener in dispose() method (lines 351-361)
  • Applied consistently across all 3 tab creation methods
  • Impact: Prevents listener accumulation over hours of use

3. Enhanced Observability: Console Log Capture

The CONSOLE_LOG source addition is valuable:

  • Routes internal errors to debug panel
  • Provides visibility into long-running session issues
  • Properly integrated into visualizer (no CSI parsing for logs)

4. PTY Reader Resilience

IOException handling in PTY readers (TabController.kt lines 983-1015, 1142-1182):

  • Distinguishes expected disconnection from unexpected errors
  • Logs appropriately without crashing
  • Ensures dataStream.close() in finally block

🔍 Code Quality Observations

Good Practices

  • Proper exception hierarchy: Distinguishes CancellationException from other exceptions
  • Scoped coroutines: Replaced GlobalScope with proper CoroutineScope
  • Timeout protection: 5-second timeout on process kill operations
  • Detailed comments: Explains critical sections (e.g., "CRITICAL: This coroutine must never die silently")

Minor Concerns

1. Recursive Auto-Restart (ComposeTerminalDisplay.kt:337)

The recursive call to startRedrawProcessor() after exceptions could theoretically cause stack issues with repeated failures (though the 100ms delay makes this unlikely).

Suggestion: Consider a loop-based approach to avoid recursive calls.

2. Missing IOException Import Check

The code catches java.io.IOException but the diff doesn't show imports. Verify the import is present.

3. Timeout Hardcoding

The 5-second timeout for process kill is hardcoded. Consider making this configurable or documenting the rationale.


🧪 Testing Recommendations

High Priority

  1. Long-running stress test: Run for 6+ hours with heavy output (streaming logs, Claude responses)
  2. Rapid tab cycling: Create/close 100+ tabs to verify listener cleanup
  3. Exception injection: Manually trigger exceptions in redraw loop to verify auto-restart
  4. Memory profiling: Use VisualVM/YourKit to confirm no listener leaks over time

Medium Priority

  1. PTY disconnect scenarios: Test with unexpected process termination
  2. Debug panel verification: Confirm CONSOLE_LOG entries appear correctly
  3. Process kill timeout: Test with unresponsive processes to verify 5s timeout

🔒 Security Considerations

No security concerns identified. The changes are internal to the terminal rendering pipeline and don't affect user input, command execution, file system access, or network operations.


📊 Performance Impact

Positive

  • Eliminates memory leak: Prevents listener accumulation
  • Prevents permanent freeze: Auto-restart keeps UI responsive

Minimal Overhead

  • Exception handling in redraw loop: Negligible when no exceptions occur
  • Listener cleanup on dispose: One-time operation, insignificant cost

🐛 Potential Issues

Issue #179: Not Fully Addressed?

The PR description mentions fixing issue #179 (memory leak in BlockingTerminalDataStream), but I don't see any changes to that file in the diff. Issue #179 describes a StringBuilder buffer that grows indefinitely without compaction.

Question: Does the ModelListener cleanup indirectly fix this, or is there a separate fix needed for BlockingTerminalDataStream.kt?


✨ Suggestions for Future Work

  1. Metrics/Monitoring: Add metrics for redraw processor restarts, listener registration/deregistration, PTY read failures
  2. Configurable Timeouts: Extract hardcoded timeouts to settings
  3. Unit Tests: Add tests for auto-restart behavior, listener cleanup, exception handling

📝 Final Recommendation

✅ APPROVE with minor suggestions

This PR addresses critical production issues with well-reasoned fixes. The code quality is high, with proper exception handling, resource cleanup, and detailed comments. The only concerns are:

  1. Recursive auto-restart (easily addressed with loop-based approach)
  2. Clarification needed on issue Memory leak in BlockingTerminalDataStream causes terminal to stop accepting input after prolonged use #179 fix for BlockingTerminalDataStream

The PR is production-ready and should significantly improve stability for long-running sessions.


🎯 Summary Score

Category Rating Notes
Code Quality 9/10 Excellent exception handling and comments
Bug Fixes 10/10 Addresses root causes effectively
Testing 7/10 Needs manual stress testing
Documentation 9/10 Clear comments and PR description
Performance 9/10 Fixes leaks, minimal overhead

Overall: Strong PR that addresses critical stability issues. Recommend merge after addressing recursive restart concern.


🤖 Generated with Claude Code

@claude
Copy link
Copy Markdown

claude Bot commented Dec 28, 2025

Pull Request Review - PR #193

Overview

This PR addresses critical UI freezing issues after extended terminal sessions. The analysis is thorough and the fixes are well-targeted. Overall, this is a solid PR with excellent documentation and defensive programming practices.


Strengths

1. Excellent Problem Analysis

  • Clear identification of 4 distinct root causes with proper prioritization
  • Detailed commit messages with rationale

2. Defensive Programming

  • Auto-restart capability for redraw processor prevents silent death
  • Proper exception handling with logging
  • Safe cleanup patterns with runCatching
  • Timeout protection on process kills (5 seconds)

3. Resource Management

  • ModelListener cleanup prevents memory leaks
  • Dedicated cleanupScope replaces GlobalScope
  • Proper coroutine cancellation

4. Observability Enhancement

  • Console log capture via CONSOLE_LOG source
  • Errors visible in both System.err and Debug Panel

Key Code Quality Observations

ComposeTerminalDisplay.kt - Auto-restart Logic (lines 299-350)

Concern: The auto-restart logic is recursive. If startRedrawProcessor crashes repeatedly, this could cause stack overflow.

Recommendation: Consider loop-based restart instead of recursion:

  • Use a while loop with shouldRestart flag
  • Prevents unbounded call stack growth if coroutine crashes frequently

TabController.kt - ModelListener Management

Excellent fix! The pattern is correct:

  1. Store listener reference in TerminalTab
  2. Register with textBuffer.addModelListener
  3. Remove in dispose with textBuffer.removeModelListener

This prevents the exact memory leak described (accumulated listeners over tab create/close cycles).

PTY Reader Error Handling (lines 995-1023, 1155-1187)

Great addition! Prevents silent reader death on IO exceptions.

Observation: Identical logic appears in both createTab and preConnect. Consider extracting to helper function to eliminate 60+ lines of duplication.


Test Coverage

Missing: The PR description has unchecked test plan items. Before merging, recommend:

  1. Rapid tab create/close (50 cycles) to verify listener cleanup
  2. Enable debug mode and verify CONSOLE_LOG appears in Debug Panel
  3. Trigger a redraw processor crash to verify auto-restart

Final Recommendation

APPROVE with minor suggestions

This PR fixes critical stability issues with well-designed solutions. Code quality is high, and defensive programming practices are exemplary.

Required Before Merge:

  1. Test listener cleanup with 50+ tab create/close cycles
  2. Verify console logging works in Debug Panel

Optional Improvements:

  1. Consider loop-based restart instead of recursion in startRedrawProcessor
  2. Extract duplicate PTY reader logic to helper function
  3. Add tab ID to error messages for multi-tab debugging
  4. Document console log capture feature in CLAUDE.md

Questions for Author:

  1. Has the auto-restart logic been tested with repeated failures?
  2. Does processToKill.kill need force kill fallback after timeout?
  3. Is TerminalTextBuffer.removeModelListener idempotent?

Great work on this PR! The thorough analysis and comprehensive fixes will significantly improve BossTerm stability.

Root cause analysis identified multiple failure modes that could cause
the terminal UI to stop updating while PTY continues working:

1. **Redraw processor exception handling** (CRITICAL)
   - Added try-catch to startRedrawProcessor() coroutine
   - Individual redraw failures now log and continue instead of killing loop
   - Auto-restart on fatal errors prevents permanent UI freeze
   - Added redrawJob tracking to cancel previous job on restart (prevents coroutine leak)

2. **ModelListener memory leak prevention** (HIGH)
   - Added modelListener field to TerminalTab
   - Listeners now properly removed in dispose() to prevent accumulation
   - Fixes memory leak over hours of tab create/close cycles

3. **PTY reader exception handling** (MEDIUM-HIGH)
   - Added try-catch for IOException in PTY reader coroutine
   - Graceful handling of PTY disconnection scenarios
   - Added runCatching for defensive double-close safety

4. **Coroutine lifecycle management** (MEDIUM)
   - Added dedicated cleanupScope for process kill operations
   - Replaced GlobalScope/ad-hoc scopes with lifecycle-aware scope
   - cleanupScope cancelled in disposeAll() for proper cleanup
   - Added 5-second timeout for process kill operations

5. **Console log capture for debug panel** (Enhancement)
   - Added CONSOLE_LOG source to ChunkSource enum
   - Errors/warnings now visible in debug panel alongside PTY output
   - Console logs displayed as plain text (not parsed for escape sequences)
   - Added logError() and logTabError() helpers for dual logging
   - Capture debugCollector reference before tab disposal for safe async logging

Files modified:
- ComposeTerminalDisplay.kt: Exception handling, redrawJob tracking, debug log callback
- TabController.kt: cleanupScope, listener management, PTY error handling, logging
- TerminalTab.kt: modelListener field, dispose cleanup
- DebugModels.kt: CONSOLE_LOG enum value
- DebugDataCollector.kt: LOG# prefix for console entries
- DebugControlSequenceVisualizer.kt: Plain text display for console logs

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@claude
Copy link
Copy Markdown

claude Bot commented Dec 28, 2025

Pull Request Review: Fix UI Freeze After Long Sessions

This is an excellent, production-ready fix for critical stability issues. The changes directly address the root causes of UI freezing after extended terminal sessions.


✅ Strengths

1. Critical Bug Fixes

The PR identifies and fixes 4 distinct failure modes, each of which could independently cause UI freeze:

  • Redraw processor silent death (CRITICAL): The coroutine exception handling is now robust with auto-restart capability
  • ModelListener memory leak (HIGH): Proper cleanup prevents listener accumulation over time
  • PTY reader crashes (MEDIUM-HIGH): IOException handling prevents silent failures
  • Resource leak in cleanup (LOW): Scoped coroutines with timeouts replace GlobalScope

2. Excellent Exception Handling Architecture

// ComposeTerminalDisplay.kt: Nested exception handling
while (shouldRestart && isActive) {
    try {
        for (request in redrawChannel) {
            try {
                // Individual redraw handling
            } catch (e: Exception) {
                logError("Redraw failed (continuing)", e)
            }
        }
    } catch (e: CancellationException) {
        throw e  // Proper coroutine cancellation
    } catch (e: Exception) {
        // Auto-restart with backoff
        delay(100)
        shouldRestart = true
    }
}

This two-level approach is textbook Kotlin coroutine error handling:

  • Inner try-catch: Individual redraw failures don't kill the loop
  • Outer try-catch: Channel errors trigger restart with 100ms backoff
  • Proper CancellationException propagation

3. Memory Leak Prevention

The ModelListener cleanup is essential:

// TerminalTab.kt - Store reference for cleanup
val modelListener = object : TerminalModelListener { ... }
textBuffer.addModelListener(modelListener)

// In dispose():
modelListener?.let {
    textBuffer.removeModelListener(it)
}

Impact: Prevents listener accumulation over hours of tab create/close cycles (reported in issue #173 with 82,000+ chunks).

4. Code Deduplication

The new startPtyReaderCoroutine() helper eliminates 100+ lines of duplication between createTab() and preConnect() paths.

5. Enhanced Observability

  • Console log capture via ChunkSource.CONSOLE_LOG
  • Errors now visible in Debug Panel alongside PTY output
  • Timestamped logs with stack traces
  • Makes diagnosing production issues much easier

🔍 Code Quality Observations

Minor: Loop-Based Restart Pattern

File: ComposeTerminalDisplay.kt:299-355

The shouldRestart loop is correct but could be simplified:

// Current implementation (lines 299-355)
var shouldRestart = true
while (shouldRestart && isActive) {
    shouldRestart = false
    try {
        // ... loop body ...
    } catch (e: Exception) {
        shouldRestart = true
        delay(100)
    }
}

Alternative (more idiomatic Kotlin):

while (isActive) {
    try {
        for (request in redrawChannel) { ... }
        break  // Exit loop on normal channel close
    } catch (e: CancellationException) {
        throw e
    } catch (e: Exception) {
        logError("Redraw processor crashed, restarting", e)
        delay(100)
        // Loop continues
    }
}

This achieves the same result with slightly cleaner logic. However, the current implementation is perfectly acceptable and the documentation is clear.

Minor: Error Message Consistency

File: TabController.kt:1309, 1317

} catch (e: kotlinx.coroutines.TimeoutCancellationException) {
    val message = "[timestamp] WARN: Process kill timed out after 5 seconds"
    debugCollectorForLogging?.recordChunk(message, ChunkSource.CONSOLE_LOG)
} catch (e: Exception) {
    val message = "[timestamp] WARN: Error killing process"
    debugCollectorForLogging?.recordChunk(message, ChunkSource.CONSOLE_LOG)
}

Suggestion: Use the centralized logError() pattern here for consistency (though the current approach works since we're outside the tab context).


🧪 Testing Recommendations

The test plan is solid but could be expanded:

Suggested Additional Tests

  1. Exception Injection Test

    • Temporarily inject exceptions in redraw loop
    • Verify auto-restart behavior
    • Confirm no coroutine leaks after 100+ restarts
  2. Memory Profiler Validation

    • Run with VisualVM or JProfiler
    • Monitor listener count in TerminalTextBuffer over 50+ tab cycles
    • Verify no growth in ComposeTerminalDisplay instances
  3. Stress Test

    • 100+ rapid tab create/close cycles
    • Verify no listener accumulation
    • Check debug panel for any ERROR logs
  4. Console Log Capture Test

    • Trigger an exception (e.g., invalid PTY write)
    • Verify error appears in Debug Panel under CONSOLE_LOG
    • Confirm proper formatting and stack traces

🔒 Security & Performance

Security

✅ No security concerns. All changes are internal error handling and resource management.

Performance

Positive impact:

  • Prevents GC thrashing from listener leaks
  • Scoped coroutines with timeouts prevent orphaned GlobalScope jobs
  • 100ms backoff prevents tight crash loops

📊 Impact Analysis

Issue Before After
Redraw processor crashes Silent death → permanent UI freeze Auto-restart with logging
ModelListener accumulation ~1 listener/tab/second over hours Cleaned up in dispose()
PTY reader exceptions Silent crash, no error visibility Graceful handling + logs
Process cleanup leaks GlobalScope orphans Scoped with 5s timeout

Related Issues: Closes #173 (82,000 chunks memory issue), Closes #179 (UI freeze after 2-3+ hours)


✅ Final Verdict

APPROVED - This is production-ready code that significantly improves stability.

Why This Gets My Approval:

  1. ✅ Fixes 4 independent failure modes
  2. ✅ Excellent error handling architecture
  3. ✅ No breaking changes to public API
  4. ✅ Enhanced debugging capabilities
  5. ✅ Comprehensive documentation in comments
  6. ✅ Code deduplication (DRY principle)

Recommended Before Merge:

  • Run extended stress test (4+ hours, high volume)
  • Verify no memory growth with profiler
  • Test rapid tab cycling (50+ tabs)
  • Confirm CONSOLE_LOG entries appear in Debug Panel

Post-Merge Follow-Up (optional):

  • Consider adding unit tests for exception handling paths
  • Monitor production logs for "Redraw processor crashed" messages
  • Add metrics for auto-restart frequency

Great work on root cause analysis and systematic fixes! The debug tooling from #10 proved invaluable for diagnosing these issues.

🤖 Generated with Claude Code

@kshivang kshivang merged commit 1c4edc1 into master Dec 28, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant