Fix circuit breaker false positives from JSON field names#6
Conversation
…etection Fixes circuit breaker opening prematurely due to naive error pattern matching that treated JSON field names like "is_error": false as actual errors. Changes: - ralph_loop.sh: Implement two-stage error detection with JSON filtering - lib/response_analyzer.sh: Apply same filtering to error counting - tests/test_error_detection.sh: Add comprehensive test suite (12 scenarios) Error detection now: - Filters out JSON field patterns before searching for errors - Uses context-specific patterns (^Error:, ]: error, Exception, Fatal) - Avoids type annotations (error: Error) and code identifiers - Includes debug logging when VERBOSE_PROGRESS=true Test coverage validates: ✓ JSON fields don't trigger false positives ✓ Real error messages are correctly detected ✓ Mixed content handled properly ✓ Code diffs and documentation excluded This prevents the consecutive_same_error counter from incrementing on false positives, eliminating unnecessary circuit breaker trips.
WalkthroughTwo-stage error detection replaces single grep: JSON-like fields are filtered out first, then real error phrases are matched. detect_stuck_loop now requires all current error lines to appear across recent history files. Verbose progress logging added; two new test suites validate behavior. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
ralph_loop.sh (1)
390-409: LGTM with minor optimization suggestion.The two-stage error detection implementation correctly filters JSON field patterns before scanning for actual errors, and the pattern is consistent with the test suite. The verbose debug logging is properly gated behind
VERBOSE_PROGRESSand limits output to 3 lines.💡 Optional: Avoid duplicate grep in verbose mode
Currently, verbose mode runs the same grep twice (lines 394-395 for detection, lines 401-402 for display). Consider capturing matching lines once:
- # Two-stage error detection to avoid JSON field false positives - # Stage 1: Filter out JSON field patterns like "is_error": false - # Stage 2: Look for actual error messages in specific contexts - # Avoid type annotations like "error: Error" by requiring lowercase after ": error" - if grep -v '"[^"]*error[^"]*":' "$output_file" 2>/dev/null | \ - grep -qE '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)'; then - has_errors="true" - - # Debug logging: show what triggered error detection - if [[ "$VERBOSE_PROGRESS" == "true" ]]; then - log_status "DEBUG" "Error patterns found:" - grep -v '"[^"]*error[^"]*":' "$output_file" 2>/dev/null | \ - grep -nE '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)' | \ - head -3 | while IFS= read -r line; do - log_status "DEBUG" " $line" - done - fi - + # Two-stage error detection to avoid JSON field false positives + # Stage 1: Filter out JSON field patterns like "is_error": false + # Stage 2: Look for actual error messages in specific contexts + local error_matches=$(grep -v '"[^"]*error[^"]*":' "$output_file" 2>/dev/null | \ + grep -nE '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)' 2>/dev/null) + + if [[ -n "$error_matches" ]]; then + has_errors="true" + + # Debug logging: show what triggered error detection + if [[ "$VERBOSE_PROGRESS" == "true" ]]; then + log_status "DEBUG" "Error patterns found:" + echo "$error_matches" | head -3 | while IFS= read -r line; do + log_status "DEBUG" " $line" + done + fi + log_status "WARN" "Errors detected in output, check: $output_file" fitests/test_error_detection.sh (1)
1-46: Missing test strategy comments.Per coding guidelines, complex test scenarios should include comments explaining the test strategy. Consider adding a header comment that explains:
- The two-stage filtering approach being validated
- Why specific test cases were chosen (JSON false positives, actual errors, edge cases)
- How the test pattern relates to the implementations in
ralph_loop.shandlib/response_analyzer.shBased on coding guidelines: "Include comments explaining test strategy for complex test scenarios"
📝 Suggested test strategy comment
#!/bin/bash # Test script for error detection fix # Validates that JSON field names don't trigger false positives # # TEST STRATEGY: # This suite validates the two-stage error detection implemented in ralph_loop.sh # to prevent circuit breaker false positives from JSON output. # # Stage 1: Filter out JSON field patterns (e.g., "is_error": false, "error": null) # Stage 2: Detect actual errors using context-specific patterns (Error:, Exception, Fatal) # # Coverage: # - JSON fields with "error" keyword (tests 1, 2, 8) # - Actual error messages with context (tests 3, 4, 7, 9) # - Mixed JSON + real errors (test 5) # - Benign content (tests 6, 11) # - Code/diffs with error keywords (test 12) # - Edge cases (test 10) # # Note: Currently validates ralph_loop.sh implementation only. # TODO: Add separate tests for lib/response_analyzer.sh which uses different patterns. set -e
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
lib/response_analyzer.shralph_loop.shtests/test_error_detection.sh
🧰 Additional context used
📓 Path-based instructions (2)
**/*.sh
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.sh: Write unit tests for bash script functions using test.sh
Update inline comments in bash scripts when implementation changes; keep all comments current and remove outdated comments immediately
Files:
lib/response_analyzer.shtests/test_error_detection.shralph_loop.sh
**/test*.sh
📄 CodeRabbit inference engine (CLAUDE.md)
**/test*.sh: Write integration tests for Ralph loop behavior
Write end-to-end tests for full development cycles
Include comments explaining test strategy for complex test scenarios
Files:
tests/test_error_detection.sh
🧠 Learnings (7)
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/*.sh : Write unit tests for bash script functions using test.sh
Applied to files:
tests/test_error_detection.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Include comments explaining test strategy for complex test scenarios
Applied to files:
tests/test_error_detection.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write end-to-end tests for full development cycles
Applied to files:
tests/test_error_detection.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write integration tests for Ralph loop behavior
Applied to files:
tests/test_error_detection.shralph_loop.sh
📚 Learning: 2025-12-31T19:30:50.708Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.708Z
Learning: Test Ralph loop with new features before marking features complete using 'ralph --monitor --calls 5' or './test.sh --coverage'
Applied to files:
tests/test_error_detection.shralph_loop.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to CLAUDE.md : Keep CLAUDE.md synchronized with codebase - add new commands to 'Key Commands' section, update 'Exit Conditions and Thresholds' when logic changes, keep installation instructions accurate and tested, document new Ralph loop behaviors
Applied to files:
ralph_loop.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to PROMPT.md : Update PROMPT.md if Ralph's autonomous behavior needs modification
Applied to files:
ralph_loop.sh
Aligns error detection patterns across all implementations and improves test coverage based on CodeRabbit's critical and major findings. Changes: 1. CRITICAL: Align lib/response_analyzer.sh pattern with ralph_loop.sh - Changed Stage 1 filter from '"[^"]*\(error\|failed\)"[^"]*":' to '"[^"]*error[^"]*":' - Removed bare 'cannot' and 'unable to' from Stage 2 (prevent false positives in prose) - Both files now use identical patterns for consistency 2. MAJOR: Improved test coverage - Renamed test 10 from "Cannot/unable in error context" to "Error prefix with descriptive message" - Added test 10a to validate bare "cannot/unable" DON'T trigger false positives - Now testing 13 scenarios (was 12) 3. MINOR: Added comprehensive test strategy documentation - Header comments explain two-stage filtering approach - Documents pattern consistency requirement - Lists all 13 test scenarios and their purpose Test results: ✓ All 13 tests passing ✓ Pattern consistency validated across ralph_loop.sh and lib/response_analyzer.sh ✓ False positive scenarios properly excluded Addresses CodeRabbit review comments: - r2655862688 (Critical pattern inconsistency) - r2655862689 (Major test coverage gap) - r2655862690 (Minor misleading test name)
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
lib/response_analyzer.sh (3)
267-267: Critical: Old naive error pattern still present.Line 267 uses
grep -i "error\|failed"which will match JSON field names like"is_error": falseor"error": null, causing the same false positives this PR aims to eliminate. The detect_stuck_loop function must apply the same two-stage filtering as analyze_response to maintain consistency.🔎 Proposed fix to apply two-stage filtering
- # Extract key errors from current output - local current_errors=$(grep -i "error\|failed" "$current_output" 2>/dev/null | sort | uniq) + # Extract key errors from current output using two-stage filtering + # Stage 1: Filter out JSON field patterns, Stage 2: Match actual error messages + local current_errors=$(grep -v '"[^"]*error[^"]*":' "$current_output" 2>/dev/null | \ + grep -E '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)' \ + 2>/dev/null | sort | uniq)
18-18: Remove unused configuration array.The
STUCK_INDICATORSarray is defined but never referenced in the code. The actual stuck detection at line 103 useserror_count > 5rather than these patterns. Per coding guidelines, dead code should be removed to reduce maintenance burden.🔎 Proposed fix to remove unused array
-STUCK_INDICATORS=("error" "failed" "cannot" "unable to" "blocked")
254-286: Add unit tests for detect_stuck_loop function to the test suite.The test file (tests/test_error_detection.sh) validates two-stage error filtering for
analyze_responseacross 13 scenarios, but does not testdetect_stuck_loop. Line 268 uses simplegrep -i "error\|failed"without the two-stage filtering thatanalyze_responseapplies (grep -v for JSON fields, then grep for actual error patterns). Tests should verifydetect_stuck_loopwith JSON-containing input to ensure it does not trigger false positives like"is_error": false.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/response_analyzer.shtests/test_error_detection.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_error_detection.sh
🧰 Additional context used
📓 Path-based instructions (1)
**/*.sh
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.sh: Write unit tests for bash script functions using test.sh
Update inline comments in bash scripts when implementation changes; keep all comments current and remove outdated comments immediately
Files:
lib/response_analyzer.sh
🧠 Learnings (1)
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write integration tests for Ralph loop behavior
Applied to files:
lib/response_analyzer.sh
🔇 Additional comments (1)
lib/response_analyzer.sh (1)
92-98: Critical issue resolved: patterns now aligned correctly.The previously flagged critical inconsistency has been addressed:
- Stage 1 now uses the simplified
'"[^"]*error[^"]*":'pattern- Stage 2 has removed the bare
cannotandunable topatterns that caused false positives- Comments accurately document the two-stage approach and alignment goal
This implementation now correctly filters JSON field names while detecting real error messages.
Addresses additional CodeRabbit findings in lib/response_analyzer.sh: CRITICAL (line 267): - Fixed detect_stuck_loop() to use two-stage filtering - Was using naive grep -i "error\|failed" pattern - Now filters JSON fields before extracting errors - Pattern aligned with analyze_response() for consistency DEAD CODE (line 18): - Removed unused STUCK_INDICATORS array - Array was defined but never referenced in code - Reduces maintenance burden per coding guidelines TEST COVERAGE: - Added comprehensive test suite for detect_stuck_loop() - New file: tests/test_stuck_loop_detection.sh - 7 test scenarios validating: * JSON fields don't trigger false stuck detection * Actual repeated errors are correctly detected * Type annotations are properly excluded * Function returns appropriate exit codes Test results: ✓ Error detection tests: 13/13 passing ✓ Stuck loop tests: 7/7 passing ✓ Total: 20/20 tests passing This ensures both error detection functions (analyze_response and detect_stuck_loop) use identical filtering logic, preventing circuit breaker false positives across all code paths. Addresses CodeRabbit review comments: - Outside diff range comment: line 267 (Critical) - Outside diff range comment: line 18 (Dead code) - Outside diff range comment: lines 254-286 (Test coverage)
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/response_analyzer.sh (1)
268-283: Multi-line error matching bug in stuck detection.When
current_errorscontains multiple distinct error messages (multiple lines aftersort | uniq), thegrep -q "$current_errors"on Line 279 will only match the first line. This means stuck loops with multiple different errors won't be detected correctly.The test suite doesn't cover this scenario—all tests use outputs that extract to a single error line.
Example scenario that would fail:
# Current output has two distinct errors Error: Failed to compile src/main.ts Fatal: Database connection lost # Both errors appear in all 3 historical files # But grep -q would only check if the first error appears🔎 Proposed fix to check each error individually
- # Check if same errors appear in all recent outputs - local stuck_count=0 - while IFS= read -r output_file; do - if grep -q "$current_errors" "$output_file" 2>/dev/null; then - ((stuck_count++)) - fi - done <<< "$recent_outputs" - - if [[ $stuck_count -ge 3 ]]; then - return 0 # Stuck on same error - else - return 1 # Making progress or different errors - fi + # Check if same errors appear in all recent outputs + local all_files_match=true + while IFS= read -r output_file; do + local file_matches_all=true + while IFS= read -r error_line; do + if ! grep -qF "$error_line" "$output_file" 2>/dev/null; then + file_matches_all=false + break + fi + done <<< "$current_errors" + + if [[ "$file_matches_all" != "true" ]]; then + all_files_match=false + break + fi + done <<< "$recent_outputs" + + if [[ "$all_files_match" == "true" ]]; then + return 0 # Stuck on same error(s) + else + return 1 # Making progress or different errors + fiNote: Changed to
grep -qFfor literal fixed-string matching instead of regex pattern matching.
🧹 Nitpick comments (1)
tests/test_stuck_loop_detection.sh (1)
101-116: Consider adding a test case for multiple distinct errors.While the current test suite is comprehensive for single-error scenarios, there's no test case where the output contains multiple distinct errors that all match the Stage 2 error patterns.
Suggested test scenario:
# Current output with two distinct errors Error: Failed to compile src/main.ts Fatal: Database connection lost # Both errors repeated in all 3 history files # Should trigger stuck detection (exit code 0)This would validate that the stuck loop detection correctly handles outputs with multiple concurrent error types. Currently, Test 3 appears to have two lines but only the "Error:" line matches the extraction pattern.
🔎 Proposed additional test case
Add after Test 3 (Line 116):
# Test 3b: Multiple distinct errors repeated should trigger stuck detection cat > "$TEST_DIR/current_output.log" << 'EOF' Build started Error: Failed to compile src/main.ts Fatal: Database connection lost EOF # Create 3 history files with both errors for i in 1 2 3; do sleep 0.1 cat > "$HISTORY_DIR/claude_output_00${i}.log" << 'EOF' Build started Error: Failed to compile src/main.ts Fatal: Database connection lost EOF done run_test "Multiple distinct errors repeated trigger stuck detection" 0Based on learnings: Write integration tests for Ralph loop behavior.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/response_analyzer.shtests/test_stuck_loop_detection.sh
🧰 Additional context used
📓 Path-based instructions (2)
**/*.sh
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.sh: Write unit tests for bash script functions using test.sh
Update inline comments in bash scripts when implementation changes; keep all comments current and remove outdated comments immediately
Files:
tests/test_stuck_loop_detection.shlib/response_analyzer.sh
**/test*.sh
📄 CodeRabbit inference engine (CLAUDE.md)
**/test*.sh: Write integration tests for Ralph loop behavior
Write end-to-end tests for full development cycles
Include comments explaining test strategy for complex test scenarios
Files:
tests/test_stuck_loop_detection.sh
🧠 Learnings (4)
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write integration tests for Ralph loop behavior
Applied to files:
tests/test_stuck_loop_detection.shlib/response_analyzer.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write end-to-end tests for full development cycles
Applied to files:
tests/test_stuck_loop_detection.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/*.sh : Write unit tests for bash script functions using test.sh
Applied to files:
tests/test_stuck_loop_detection.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Include comments explaining test strategy for complex test scenarios
Applied to files:
tests/test_stuck_loop_detection.sh
🧬 Code graph analysis (1)
tests/test_stuck_loop_detection.sh (1)
lib/response_analyzer.sh (1)
detect_stuck_loop(254-289)
🪛 Shellcheck (0.11.0)
lib/response_analyzer.sh
[warning] 268-268: Declare and assign separately to avoid masking return values.
(SC2155)
🔇 Additional comments (2)
lib/response_analyzer.sh (1)
91-97: LGTM! Pattern consistency resolved.The two-stage filtering approach correctly addresses the JSON false positive issue. Stage 1 filters out JSON field patterns, and Stage 2 matches actual error messages in specific contexts. The pattern is now aligned across functions as noted in the comments.
Based on past review comments, this addresses the pattern inconsistency that was flagged previously.
tests/test_stuck_loop_detection.sh (1)
1-64: Excellent test structure and documentation.The test suite demonstrates best practices:
- Clear test strategy documented in the header (lines 6-13)
- Proper cleanup with trap on EXIT
- Well-organized run_test helper with colored output
- Comprehensive scenario coverage for JSON false positives
Based on coding guidelines: Test strategy for complex scenarios is well explained. ✓
Addresses CodeRabbit outside diff range comment on lines 268-283:
CRITICAL BUG:
The detect_stuck_loop function had a multi-line string handling bug where
only the first error line was checked against historical outputs when
multiple distinct errors were present.
Problem:
grep -q "$current_errors" file.log
# When $current_errors has multiple lines, grep only matches first line
Impact:
Stuck loops with multiple recurring errors would not be detected correctly,
potentially allowing Ralph to continue running despite being genuinely stuck.
Fix:
Changed from simple grep to nested loop checking:
- For each historical output file
- For each error line in current output
- ALL error lines must appear in that file
- Only returns "stuck" if ALL files contain ALL current errors
Used grep -qF for literal fixed-string matching (not regex) to avoid
edge cases with special characters in error messages.
Test Coverage:
Added 2 new test scenarios (7 → 9 total tests):
Test 8: Multiple distinct errors where ALL repeat across history
Current: Error: Build failed + Fatal: DB lost + Exception: NPE
History: All 3 files contain all 3 errors
Expected: Stuck detected ✅
Test 9: Multiple errors where only some repeat
Current: Error: Build failed + Fatal: DB lost
History: Only first error appears consistently
Expected: Not stuck ✅
All existing tests continue to pass, validating backward compatibility.
Test results:
✓ Error detection tests: 13/13 passing
✓ Stuck loop tests: 9/9 passing (was 7/7)
✓ Total: 22/22 tests passing
This ensures detect_stuck_loop correctly handles the real-world scenario
where Ralph gets stuck on multiple simultaneous recurring errors.
Addresses CodeRabbit review comment:
- Outside diff range (lines 268-283): Multi-line error matching bug
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/response_analyzer.sh (1)
268-268: Optional: Address Shellcheck warning about return value masking.Shellcheck warns that combining declaration and assignment can mask command return values. In this case, the masking is intentional since the code relies on checking whether
current_errorsis empty (Line 272) rather than examining exit codes. The current implementation works correctly.If you prefer to satisfy Shellcheck for pedantic correctness:
🔎 Optional refactor to separate declaration and assignment
- local current_errors=$(grep -v '"[^"]*error[^"]*":' "$current_output" 2>/dev/null | \ - grep -E '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)' 2>/dev/null | \ - sort | uniq) + local current_errors + current_errors=$(grep -v '"[^"]*error[^"]*":' "$current_output" 2>/dev/null | \ + grep -E '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)' 2>/dev/null | \ + sort | uniq)
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/response_analyzer.shtests/test_stuck_loop_detection.sh
🧰 Additional context used
📓 Path-based instructions (2)
**/*.sh
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.sh: Write unit tests for bash script functions using test.sh
Update inline comments in bash scripts when implementation changes; keep all comments current and remove outdated comments immediately
Files:
tests/test_stuck_loop_detection.shlib/response_analyzer.sh
**/test*.sh
📄 CodeRabbit inference engine (CLAUDE.md)
**/test*.sh: Write integration tests for Ralph loop behavior
Write end-to-end tests for full development cycles
Include comments explaining test strategy for complex test scenarios
Files:
tests/test_stuck_loop_detection.sh
🧠 Learnings (4)
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write integration tests for Ralph loop behavior
Applied to files:
tests/test_stuck_loop_detection.shlib/response_analyzer.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/*.sh : Write unit tests for bash script functions using test.sh
Applied to files:
tests/test_stuck_loop_detection.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write end-to-end tests for full development cycles
Applied to files:
tests/test_stuck_loop_detection.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Include comments explaining test strategy for complex test scenarios
Applied to files:
tests/test_stuck_loop_detection.sh
🧬 Code graph analysis (1)
tests/test_stuck_loop_detection.sh (1)
lib/response_analyzer.sh (1)
detect_stuck_loop(254-300)
🪛 Shellcheck (0.11.0)
lib/response_analyzer.sh
[warning] 268-268: Declare and assign separately to avoid masking return values.
(SC2155)
🔇 Additional comments (4)
lib/response_analyzer.sh (2)
265-296: Improved stuck-loop detection logic is robust.The refactored logic correctly implements the requirement that ALL current error lines must appear in ALL recent history files to detect a stuck loop. The nested loop structure (Lines 279-293) is more complex than the previous implementation but appropriately handles multi-line error scenarios and prevents false positives from partial matches.
The use of
grep -qFfor literal fixed-string matching (Line 283) is the right choice to avoid unintended regex interpretation of error messages containing special characters.
91-97: Patterns are correctly aligned with ralph_loop.sh.The Stage 1 filtering (
grep -v '"[^"]*error[^"]*":') and Stage 2 error detection regex (^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL) are identical across both files, confirming consistent behavior as claimed.tests/test_stuck_loop_detection.sh (2)
6-14: Excellent test strategy documentation.The test strategy comment clearly explains what the detect_stuck_loop function does and what aspects are being validated. This follows the coding guideline to "include comments explaining test strategy for complex test scenarios."
As per coding guidelines, test strategy documentation is required for complex scenarios in test files.
1-241: Comprehensive test coverage validates two-stage filtering behavior.The test suite provides excellent coverage of the detect_stuck_loop function's behavior:
- False-positive prevention: Tests 2, 5, 7 validate that JSON fields, type annotations, and non-error content don't trigger stuck detection
- True-positive detection: Tests 3, 6, 8 validate that actual repeated errors are correctly detected
- Multi-error logic: Tests 8-9 validate the requirement that ALL current error lines must appear in ALL history files for stuck detection
- Edge cases: Tests 1, 4 cover no-history and differing-errors scenarios
The test structure with clear naming, colored output, and per-test result tracking makes it easy to diagnose failures.
Based on learnings, integration tests for Ralph loop behavior should be comprehensive.
Updated both documentation files to reflect recent enhancements and fixes: CLAUDE.md Changes: - Added comprehensive Core Architecture section detailing main scripts and lib/ components - Documented lib/circuit_breaker.sh and lib/response_analyzer.sh modular architecture - Added detailed Exit Conditions and Thresholds section with circuit breaker thresholds - Documented advanced two-stage error detection process to eliminate false positives - Documented multi-line error matching for accurate stuck loop detection - Added Recent Improvements section highlighting v0.9.0 circuit breaker enhancements - Updated Global Installation section to include lib/ directory and ralph-import command - Test coverage details: 13 error detection + 9 stuck loop tests README.md Changes: - Updated "What's Working Now" section with circuit breaker enhancements - Added "Recent Improvements" section highlighting v0.9.0 updates - Updated test count from 75 to 97 tests (75 core + 13 error detection + 9 stuck loop) - Enhanced circuit breaker description with two-stage filtering details - Added circuit breaker thresholds to Exit Thresholds section - Updated test commands to include error detection and stuck loop test scripts - Updated current test status with specialized test file counts Key improvements documented: - Multi-line error matching fix for detect_stuck_loop function - JSON field false positive elimination (e.g., "is_error": false) - Two-stage error filtering for accurate error detection - Installation fix for lib/ directory components - 22 new tests added for circuit breaker functionality These updates ensure documentation accurately reflects the current state of the codebase following PR #6 (circuit-breaker false positives fix) and PR #4 (installation lib/ directory fix).
- Make --dangerously-skip-permissions conditional: only added when CLAUDE_ALLOWED_TOOLS is empty (default). When tools are specified via --allowed-tools, use --allowedTools with granular permissions instead. Previously --dangerously-skip-permissions was unconditional, making --allowedTools dead code. - Change CLAUDE_ALLOWED_TOOLS default from "Write,Bash(git *),Read" to "" so default behavior is skip-all-permissions. - Add prd.json completion check as exit condition frankbria#6 in should_exit_gracefully(). Deterministically checks if all userStories have passes != false, preventing infinite loops when heuristic signals are absent. - Rename output_file to output_file_path in wait_for_daily_reset() for clarity — the variable holds a path read from a tracking file, not an output file handle. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ove pytest-timeout dep Three fixes for errors observed during Ralph Loop #6 against TheStudio: 1. XPLAT-2b: ralph_validate_hooks() now checks the target project's .claude/settings.json for bare 'powershell' hook commands and auto-patches them to 'powershell.exe' in WSL environments (eliminates SessionStart errors) 2. ralph-tester: Fixed file path references to use .ralph/AGENT.md and .ralph/fix_plan.md instead of bare names (eliminates worktree file-not-found) 3. ralph-tester: Changed deadline timeout from pytest --timeout=30 (requires pytest-timeout plugin) to shell timeout command (works everywhere) Also includes: FAILSPEC-3 killswitch, FAILSPEC-4 audit logging, ADAPTIVE-1 timeout latency recording from prior working tree. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Eliminates circuit breaker false positives caused by naive error pattern matching that treated JSON field names (like
"is_error": false) as actual errors. This was causing the circuit breaker to open prematurely after 5 consecutive loops with JSON output, halting valid development work.Root Cause
The original error detection used
grep -q "error|Error|ERROR"which matched:Error: Failed to compile"is_error": false,"error": null,"error_count": 0error: Errorin TypeScript codeThis triggered the circuit breaker's
consecutive_same_errorcounter inlib/circuit_breaker.sh, which opens the circuit after 5 consecutive false positives.Changes
1.
ralph_loop.sh(lines 390-409)^Error:,]: error,Exception,FatalVERBOSE_PROGRESS=true2.
lib/response_analyzer.sh(lines 91-97)3.
tests/test_error_detection.sh(new file)Test Results
Test Scenarios Covered
"is_error": false"error": nullError: Failed to compileException: NullPointerException]: error: undefined referenceERROR: Database failedunable to accesserror: ErrortypeImpact
Before: Circuit breaker opened after 5 loops with JSON output, halting development
After: Circuit breaker only opens on actual repeated errors, JSON output ignored
This fix maintains the protective functionality of the circuit breaker while eliminating false positives from structured data formats.
Validation
Run the test suite:
Test with actual Ralph loop:
Enable debug logging:
Summary by CodeRabbit
Bug Fixes
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.