Skip to content

Fix circuit breaker false positives from JSON field names#6

Merged
frankbria merged 4 commits into
mainfrom
fix/circuit-breaker-false-positives
Dec 31, 2025
Merged

Fix circuit breaker false positives from JSON field names#6
frankbria merged 4 commits into
mainfrom
fix/circuit-breaker-false-positives

Conversation

@frankbria
Copy link
Copy Markdown
Owner

@frankbria frankbria commented Dec 31, 2025

Summary

Eliminates circuit breaker false positives caused by naive error pattern matching that treated JSON field names (like "is_error": false) as actual errors. This was causing the circuit breaker to open prematurely after 5 consecutive loops with JSON output, halting valid development work.

Root Cause

The original error detection used grep -q "error|Error|ERROR" which matched:

  • ✅ Actual error messages: Error: Failed to compile
  • ❌ JSON field names: "is_error": false, "error": null, "error_count": 0
  • ❌ Type annotations: error: Error in TypeScript code
  • ❌ Documentation: Any mention of the word "error"

This triggered the circuit breaker's consecutive_same_error counter in lib/circuit_breaker.sh, which opens the circuit after 5 consecutive false positives.

Changes

1. ralph_loop.sh (lines 390-409)

  • Two-stage filtering: First removes JSON patterns, then searches for actual errors
  • Context-specific patterns: ^Error:, ]: error, Exception, Fatal
  • Debug logging: Shows matched error lines when VERBOSE_PROGRESS=true

2. lib/response_analyzer.sh (lines 91-97)

  • Applied same two-stage filtering to error counting
  • Prevents JSON fields from contributing to stuck loop detection

3. tests/test_error_detection.sh (new file)

  • Comprehensive test suite with 12 scenarios
  • Validates JSON false positives are excluded
  • Confirms real errors are still detected
  • Tests edge cases: git diffs, documentation, type annotations

Test Results

========================================
Test Summary
========================================
Passed: 12
Failed: 0
========================================
All tests passed!

Test Scenarios Covered

Scenario Input Expected Result
JSON with "is_error": false JSON field only No error ✅ Pass
JSON with "error": null JSON field only No error ✅ Pass
Actual error message Error: Failed to compile Error detected ✅ Pass
Exception in log Exception: NullPointerException Error detected ✅ Pass
Mixed JSON + error JSON + real error Error detected ✅ Pass
Normal output No errors No error ✅ Pass
Log format error ]: error: undefined reference Error detected ✅ Pass
Multiple JSON error fields Many JSON fields No error ✅ Pass
Uppercase ERROR ERROR: Database failed Error detected ✅ Pass
Cannot/unable context unable to access Error detected ✅ Pass
Documentation Docs mentioning "error" No error ✅ Pass
TypeScript code diff error: Error type No error ✅ Pass

Impact

Before: Circuit breaker opened after 5 loops with JSON output, halting development
After: Circuit breaker only opens on actual repeated errors, JSON output ignored

This fix maintains the protective functionality of the circuit breaker while eliminating false positives from structured data formats.

Validation

Run the test suite:

./tests/test_error_detection.sh

Test with actual Ralph loop:

ralph --monitor --calls 10

Enable debug logging:

VERBOSE_PROGRESS=true ralph --monitor

Summary by CodeRabbit

  • Bug Fixes

    • Reduced false positives by introducing two-stage filtering to ignore JSON-field noise and by requiring the same error lines to appear across recent outputs before flagging a stuck condition.
  • New Features

    • Optional verbose progress logging that prints detected error lines when enabled.
  • Tests

    • Added comprehensive test suites covering varied JSON/log scenarios and stuck-loop detection across history.

✏️ Tip: You can customize this high-level summary in your review settings.

…etection

Fixes circuit breaker opening prematurely due to naive error pattern matching
that treated JSON field names like "is_error": false as actual errors.

Changes:
- ralph_loop.sh: Implement two-stage error detection with JSON filtering
- lib/response_analyzer.sh: Apply same filtering to error counting
- tests/test_error_detection.sh: Add comprehensive test suite (12 scenarios)

Error detection now:
- Filters out JSON field patterns before searching for errors
- Uses context-specific patterns (^Error:, ]: error, Exception, Fatal)
- Avoids type annotations (error: Error) and code identifiers
- Includes debug logging when VERBOSE_PROGRESS=true

Test coverage validates:
✓ JSON fields don't trigger false positives
✓ Real error messages are correctly detected
✓ Mixed content handled properly
✓ Code diffs and documentation excluded

This prevents the consecutive_same_error counter from incrementing on
false positives, eliminating unnecessary circuit breaker trips.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 31, 2025

Walkthrough

Two-stage error detection replaces single grep: JSON-like fields are filtered out first, then real error phrases are matched. detect_stuck_loop now requires all current error lines to appear across recent history files. Verbose progress logging added; two new test suites validate behavior.

Changes

Cohort / File(s) Summary
Response analyzer
lib/response_analyzer.sh
Removed global STUCK_INDICATORS. Replaced single-pass grep with two-stage filtering: first exclude JSON field/name patterns (e.g., "is_error": false, "error": false), then match explicit error indicators. detect_stuck_loop now requires every current error line to appear in all recent history files (fixed-string matching, tracks all_files_match).
Loop script
ralph_loop.sh
Adopted the same two-stage error-detection pipeline to avoid JSON false positives. Added optional verbose debug output when VERBOSE_PROGRESS="true" (prints up to three detected error lines). Maintains prior warning and has_errors behavior when errors found.
Error detection tests
tests/test_error_detection.sh
New test suite (13 scenarios) exercising two-stage filtering: JSON-only fields, actual errors, exceptions, mixed content, and benign text. Creates temp workspace, runs cases with helpers, colorized results, summarizes and exits non‑zero on failures.
Stuck-loop detection tests
tests/test_stuck_loop_detection.sh
New test suite (9 scenarios) sourcing lib/response_analyzer.sh and verifying detect_stuck_loop behavior with history files: JSON-only outputs, repeated real errors, differing history errors, mixed cases, and multi-error conditions. Uses temp history files, colorized reporting, and appropriate exit codes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I nibble through logs with whiskers keen, 🐇
JSON ghosts I skip, real faults are seen,
Two filters I run and three lines I show,
I hop, I report, then softly go—
Small rabbit triumphs in each error's glow.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: fixing circuit breaker false positives caused by JSON field name matching. It directly corresponds to the primary objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/circuit-breaker-false-positives

Comment @coderabbitai help to get the list of available commands and usage tips.

@frankbria frankbria linked an issue Dec 31, 2025 that may be closed by this pull request
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
ralph_loop.sh (1)

390-409: LGTM with minor optimization suggestion.

The two-stage error detection implementation correctly filters JSON field patterns before scanning for actual errors, and the pattern is consistent with the test suite. The verbose debug logging is properly gated behind VERBOSE_PROGRESS and limits output to 3 lines.

💡 Optional: Avoid duplicate grep in verbose mode

Currently, verbose mode runs the same grep twice (lines 394-395 for detection, lines 401-402 for display). Consider capturing matching lines once:

-            # Two-stage error detection to avoid JSON field false positives
-            # Stage 1: Filter out JSON field patterns like "is_error": false
-            # Stage 2: Look for actual error messages in specific contexts
-            # Avoid type annotations like "error: Error" by requiring lowercase after ": error"
-            if grep -v '"[^"]*error[^"]*":' "$output_file" 2>/dev/null | \
-               grep -qE '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)'; then
-                has_errors="true"
-
-                # Debug logging: show what triggered error detection
-                if [[ "$VERBOSE_PROGRESS" == "true" ]]; then
-                    log_status "DEBUG" "Error patterns found:"
-                    grep -v '"[^"]*error[^"]*":' "$output_file" 2>/dev/null | \
-                        grep -nE '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)' | \
-                        head -3 | while IFS= read -r line; do
-                        log_status "DEBUG" "  $line"
-                    done
-                fi
-
+            # Two-stage error detection to avoid JSON field false positives
+            # Stage 1: Filter out JSON field patterns like "is_error": false
+            # Stage 2: Look for actual error messages in specific contexts
+            local error_matches=$(grep -v '"[^"]*error[^"]*":' "$output_file" 2>/dev/null | \
+                grep -nE '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)' 2>/dev/null)
+            
+            if [[ -n "$error_matches" ]]; then
+                has_errors="true"
+
+                # Debug logging: show what triggered error detection
+                if [[ "$VERBOSE_PROGRESS" == "true" ]]; then
+                    log_status "DEBUG" "Error patterns found:"
+                    echo "$error_matches" | head -3 | while IFS= read -r line; do
+                        log_status "DEBUG" "  $line"
+                    done
+                fi
+
                 log_status "WARN" "Errors detected in output, check: $output_file"
             fi
tests/test_error_detection.sh (1)

1-46: Missing test strategy comments.

Per coding guidelines, complex test scenarios should include comments explaining the test strategy. Consider adding a header comment that explains:

  1. The two-stage filtering approach being validated
  2. Why specific test cases were chosen (JSON false positives, actual errors, edge cases)
  3. How the test pattern relates to the implementations in ralph_loop.sh and lib/response_analyzer.sh

Based on coding guidelines: "Include comments explaining test strategy for complex test scenarios"

📝 Suggested test strategy comment
#!/bin/bash
# Test script for error detection fix
# Validates that JSON field names don't trigger false positives
#
# TEST STRATEGY:
# This suite validates the two-stage error detection implemented in ralph_loop.sh
# to prevent circuit breaker false positives from JSON output.
#
# Stage 1: Filter out JSON field patterns (e.g., "is_error": false, "error": null)
# Stage 2: Detect actual errors using context-specific patterns (Error:, Exception, Fatal)
#
# Coverage:
# - JSON fields with "error" keyword (tests 1, 2, 8)
# - Actual error messages with context (tests 3, 4, 7, 9)
# - Mixed JSON + real errors (test 5)
# - Benign content (tests 6, 11)
# - Code/diffs with error keywords (test 12)
# - Edge cases (test 10)
#
# Note: Currently validates ralph_loop.sh implementation only.
# TODO: Add separate tests for lib/response_analyzer.sh which uses different patterns.

set -e
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e370c1e and 8fc5375.

📒 Files selected for processing (3)
  • lib/response_analyzer.sh
  • ralph_loop.sh
  • tests/test_error_detection.sh
🧰 Additional context used
📓 Path-based instructions (2)
**/*.sh

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.sh: Write unit tests for bash script functions using test.sh
Update inline comments in bash scripts when implementation changes; keep all comments current and remove outdated comments immediately

Files:

  • lib/response_analyzer.sh
  • tests/test_error_detection.sh
  • ralph_loop.sh
**/test*.sh

📄 CodeRabbit inference engine (CLAUDE.md)

**/test*.sh: Write integration tests for Ralph loop behavior
Write end-to-end tests for full development cycles
Include comments explaining test strategy for complex test scenarios

Files:

  • tests/test_error_detection.sh
🧠 Learnings (7)
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/*.sh : Write unit tests for bash script functions using test.sh

Applied to files:

  • tests/test_error_detection.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Include comments explaining test strategy for complex test scenarios

Applied to files:

  • tests/test_error_detection.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write end-to-end tests for full development cycles

Applied to files:

  • tests/test_error_detection.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write integration tests for Ralph loop behavior

Applied to files:

  • tests/test_error_detection.sh
  • ralph_loop.sh
📚 Learning: 2025-12-31T19:30:50.708Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.708Z
Learning: Test Ralph loop with new features before marking features complete using 'ralph --monitor --calls 5' or './test.sh --coverage'

Applied to files:

  • tests/test_error_detection.sh
  • ralph_loop.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to CLAUDE.md : Keep CLAUDE.md synchronized with codebase - add new commands to 'Key Commands' section, update 'Exit Conditions and Thresholds' when logic changes, keep installation instructions accurate and tested, document new Ralph loop behaviors

Applied to files:

  • ralph_loop.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to PROMPT.md : Update PROMPT.md if Ralph's autonomous behavior needs modification

Applied to files:

  • ralph_loop.sh

Comment thread lib/response_analyzer.sh
Comment thread tests/test_error_detection.sh
Comment thread tests/test_error_detection.sh Outdated
Aligns error detection patterns across all implementations and improves
test coverage based on CodeRabbit's critical and major findings.

Changes:
1. CRITICAL: Align lib/response_analyzer.sh pattern with ralph_loop.sh
   - Changed Stage 1 filter from '"[^"]*\(error\|failed\)"[^"]*":' to '"[^"]*error[^"]*":'
   - Removed bare 'cannot' and 'unable to' from Stage 2 (prevent false positives in prose)
   - Both files now use identical patterns for consistency

2. MAJOR: Improved test coverage
   - Renamed test 10 from "Cannot/unable in error context" to "Error prefix with descriptive message"
   - Added test 10a to validate bare "cannot/unable" DON'T trigger false positives
   - Now testing 13 scenarios (was 12)

3. MINOR: Added comprehensive test strategy documentation
   - Header comments explain two-stage filtering approach
   - Documents pattern consistency requirement
   - Lists all 13 test scenarios and their purpose

Test results:
✓ All 13 tests passing
✓ Pattern consistency validated across ralph_loop.sh and lib/response_analyzer.sh
✓ False positive scenarios properly excluded

Addresses CodeRabbit review comments:
- r2655862688 (Critical pattern inconsistency)
- r2655862689 (Major test coverage gap)
- r2655862690 (Minor misleading test name)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
lib/response_analyzer.sh (3)

267-267: Critical: Old naive error pattern still present.

Line 267 uses grep -i "error\|failed" which will match JSON field names like "is_error": false or "error": null, causing the same false positives this PR aims to eliminate. The detect_stuck_loop function must apply the same two-stage filtering as analyze_response to maintain consistency.

🔎 Proposed fix to apply two-stage filtering
-    # Extract key errors from current output
-    local current_errors=$(grep -i "error\|failed" "$current_output" 2>/dev/null | sort | uniq)
+    # Extract key errors from current output using two-stage filtering
+    # Stage 1: Filter out JSON field patterns, Stage 2: Match actual error messages
+    local current_errors=$(grep -v '"[^"]*error[^"]*":' "$current_output" 2>/dev/null | \
+                           grep -E '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)' \
+                           2>/dev/null | sort | uniq)

18-18: Remove unused configuration array.

The STUCK_INDICATORS array is defined but never referenced in the code. The actual stuck detection at line 103 uses error_count > 5 rather than these patterns. Per coding guidelines, dead code should be removed to reduce maintenance burden.

🔎 Proposed fix to remove unused array
-STUCK_INDICATORS=("error" "failed" "cannot" "unable to" "blocked")

254-286: Add unit tests for detect_stuck_loop function to the test suite.

The test file (tests/test_error_detection.sh) validates two-stage error filtering for analyze_response across 13 scenarios, but does not test detect_stuck_loop. Line 268 uses simple grep -i "error\|failed" without the two-stage filtering that analyze_response applies (grep -v for JSON fields, then grep for actual error patterns). Tests should verify detect_stuck_loop with JSON-containing input to ensure it does not trigger false positives like "is_error": false.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8fc5375 and 90fb558.

📒 Files selected for processing (2)
  • lib/response_analyzer.sh
  • tests/test_error_detection.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_error_detection.sh
🧰 Additional context used
📓 Path-based instructions (1)
**/*.sh

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.sh: Write unit tests for bash script functions using test.sh
Update inline comments in bash scripts when implementation changes; keep all comments current and remove outdated comments immediately

Files:

  • lib/response_analyzer.sh
🧠 Learnings (1)
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write integration tests for Ralph loop behavior

Applied to files:

  • lib/response_analyzer.sh
🔇 Additional comments (1)
lib/response_analyzer.sh (1)

92-98: Critical issue resolved: patterns now aligned correctly.

The previously flagged critical inconsistency has been addressed:

  • Stage 1 now uses the simplified '"[^"]*error[^"]*":' pattern
  • Stage 2 has removed the bare cannot and unable to patterns that caused false positives
  • Comments accurately document the two-stage approach and alignment goal

This implementation now correctly filters JSON field names while detecting real error messages.

Addresses additional CodeRabbit findings in lib/response_analyzer.sh:

CRITICAL (line 267):
- Fixed detect_stuck_loop() to use two-stage filtering
- Was using naive grep -i "error\|failed" pattern
- Now filters JSON fields before extracting errors
- Pattern aligned with analyze_response() for consistency

DEAD CODE (line 18):
- Removed unused STUCK_INDICATORS array
- Array was defined but never referenced in code
- Reduces maintenance burden per coding guidelines

TEST COVERAGE:
- Added comprehensive test suite for detect_stuck_loop()
- New file: tests/test_stuck_loop_detection.sh
- 7 test scenarios validating:
  * JSON fields don't trigger false stuck detection
  * Actual repeated errors are correctly detected
  * Type annotations are properly excluded
  * Function returns appropriate exit codes

Test results:
✓ Error detection tests: 13/13 passing
✓ Stuck loop tests: 7/7 passing
✓ Total: 20/20 tests passing

This ensures both error detection functions (analyze_response and
detect_stuck_loop) use identical filtering logic, preventing circuit
breaker false positives across all code paths.

Addresses CodeRabbit review comments:
- Outside diff range comment: line 267 (Critical)
- Outside diff range comment: line 18 (Dead code)
- Outside diff range comment: lines 254-286 (Test coverage)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/response_analyzer.sh (1)

268-283: Multi-line error matching bug in stuck detection.

When current_errors contains multiple distinct error messages (multiple lines after sort | uniq), the grep -q "$current_errors" on Line 279 will only match the first line. This means stuck loops with multiple different errors won't be detected correctly.

The test suite doesn't cover this scenario—all tests use outputs that extract to a single error line.

Example scenario that would fail:

# Current output has two distinct errors
Error: Failed to compile src/main.ts
Fatal: Database connection lost

# Both errors appear in all 3 historical files
# But grep -q would only check if the first error appears
🔎 Proposed fix to check each error individually
-    # Check if same errors appear in all recent outputs
-    local stuck_count=0
-    while IFS= read -r output_file; do
-        if grep -q "$current_errors" "$output_file" 2>/dev/null; then
-            ((stuck_count++))
-        fi
-    done <<< "$recent_outputs"
-
-    if [[ $stuck_count -ge 3 ]]; then
-        return 0  # Stuck on same error
-    else
-        return 1  # Making progress or different errors
-    fi
+    # Check if same errors appear in all recent outputs
+    local all_files_match=true
+    while IFS= read -r output_file; do
+        local file_matches_all=true
+        while IFS= read -r error_line; do
+            if ! grep -qF "$error_line" "$output_file" 2>/dev/null; then
+                file_matches_all=false
+                break
+            fi
+        done <<< "$current_errors"
+        
+        if [[ "$file_matches_all" != "true" ]]; then
+            all_files_match=false
+            break
+        fi
+    done <<< "$recent_outputs"
+
+    if [[ "$all_files_match" == "true" ]]; then
+        return 0  # Stuck on same error(s)
+    else
+        return 1  # Making progress or different errors
+    fi

Note: Changed to grep -qF for literal fixed-string matching instead of regex pattern matching.

🧹 Nitpick comments (1)
tests/test_stuck_loop_detection.sh (1)

101-116: Consider adding a test case for multiple distinct errors.

While the current test suite is comprehensive for single-error scenarios, there's no test case where the output contains multiple distinct errors that all match the Stage 2 error patterns.

Suggested test scenario:

# Current output with two distinct errors
Error: Failed to compile src/main.ts
Fatal: Database connection lost

# Both errors repeated in all 3 history files
# Should trigger stuck detection (exit code 0)

This would validate that the stuck loop detection correctly handles outputs with multiple concurrent error types. Currently, Test 3 appears to have two lines but only the "Error:" line matches the extraction pattern.

🔎 Proposed additional test case

Add after Test 3 (Line 116):

# Test 3b: Multiple distinct errors repeated should trigger stuck detection
cat > "$TEST_DIR/current_output.log" << 'EOF'
Build started
Error: Failed to compile src/main.ts
Fatal: Database connection lost
EOF
# Create 3 history files with both errors
for i in 1 2 3; do
    sleep 0.1
    cat > "$HISTORY_DIR/claude_output_00${i}.log" << 'EOF'
Build started
Error: Failed to compile src/main.ts
Fatal: Database connection lost
EOF
done
run_test "Multiple distinct errors repeated trigger stuck detection" 0

Based on learnings: Write integration tests for Ralph loop behavior.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90fb558 and 890720a.

📒 Files selected for processing (2)
  • lib/response_analyzer.sh
  • tests/test_stuck_loop_detection.sh
🧰 Additional context used
📓 Path-based instructions (2)
**/*.sh

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.sh: Write unit tests for bash script functions using test.sh
Update inline comments in bash scripts when implementation changes; keep all comments current and remove outdated comments immediately

Files:

  • tests/test_stuck_loop_detection.sh
  • lib/response_analyzer.sh
**/test*.sh

📄 CodeRabbit inference engine (CLAUDE.md)

**/test*.sh: Write integration tests for Ralph loop behavior
Write end-to-end tests for full development cycles
Include comments explaining test strategy for complex test scenarios

Files:

  • tests/test_stuck_loop_detection.sh
🧠 Learnings (4)
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write integration tests for Ralph loop behavior

Applied to files:

  • tests/test_stuck_loop_detection.sh
  • lib/response_analyzer.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write end-to-end tests for full development cycles

Applied to files:

  • tests/test_stuck_loop_detection.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/*.sh : Write unit tests for bash script functions using test.sh

Applied to files:

  • tests/test_stuck_loop_detection.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Include comments explaining test strategy for complex test scenarios

Applied to files:

  • tests/test_stuck_loop_detection.sh
🧬 Code graph analysis (1)
tests/test_stuck_loop_detection.sh (1)
lib/response_analyzer.sh (1)
  • detect_stuck_loop (254-289)
🪛 Shellcheck (0.11.0)
lib/response_analyzer.sh

[warning] 268-268: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (2)
lib/response_analyzer.sh (1)

91-97: LGTM! Pattern consistency resolved.

The two-stage filtering approach correctly addresses the JSON false positive issue. Stage 1 filters out JSON field patterns, and Stage 2 matches actual error messages in specific contexts. The pattern is now aligned across functions as noted in the comments.

Based on past review comments, this addresses the pattern inconsistency that was flagged previously.

tests/test_stuck_loop_detection.sh (1)

1-64: Excellent test structure and documentation.

The test suite demonstrates best practices:

  • Clear test strategy documented in the header (lines 6-13)
  • Proper cleanup with trap on EXIT
  • Well-organized run_test helper with colored output
  • Comprehensive scenario coverage for JSON false positives

Based on coding guidelines: Test strategy for complex scenarios is well explained. ✓

Addresses CodeRabbit outside diff range comment on lines 268-283:

CRITICAL BUG:
The detect_stuck_loop function had a multi-line string handling bug where
only the first error line was checked against historical outputs when
multiple distinct errors were present.

Problem:
  grep -q "$current_errors" file.log
  # When $current_errors has multiple lines, grep only matches first line

Impact:
Stuck loops with multiple recurring errors would not be detected correctly,
potentially allowing Ralph to continue running despite being genuinely stuck.

Fix:
Changed from simple grep to nested loop checking:
- For each historical output file
  - For each error line in current output
    - ALL error lines must appear in that file
- Only returns "stuck" if ALL files contain ALL current errors

Used grep -qF for literal fixed-string matching (not regex) to avoid
edge cases with special characters in error messages.

Test Coverage:
Added 2 new test scenarios (7 → 9 total tests):

Test 8: Multiple distinct errors where ALL repeat across history
  Current: Error: Build failed + Fatal: DB lost + Exception: NPE
  History: All 3 files contain all 3 errors
  Expected: Stuck detected ✅

Test 9: Multiple errors where only some repeat
  Current: Error: Build failed + Fatal: DB lost
  History: Only first error appears consistently
  Expected: Not stuck ✅

All existing tests continue to pass, validating backward compatibility.

Test results:
✓ Error detection tests: 13/13 passing
✓ Stuck loop tests: 9/9 passing (was 7/7)
✓ Total: 22/22 tests passing

This ensures detect_stuck_loop correctly handles the real-world scenario
where Ralph gets stuck on multiple simultaneous recurring errors.

Addresses CodeRabbit review comment:
- Outside diff range (lines 268-283): Multi-line error matching bug
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
lib/response_analyzer.sh (1)

268-268: Optional: Address Shellcheck warning about return value masking.

Shellcheck warns that combining declaration and assignment can mask command return values. In this case, the masking is intentional since the code relies on checking whether current_errors is empty (Line 272) rather than examining exit codes. The current implementation works correctly.

If you prefer to satisfy Shellcheck for pedantic correctness:

🔎 Optional refactor to separate declaration and assignment
-    local current_errors=$(grep -v '"[^"]*error[^"]*":' "$current_output" 2>/dev/null | \
-                          grep -E '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)' 2>/dev/null | \
-                          sort | uniq)
+    local current_errors
+    current_errors=$(grep -v '"[^"]*error[^"]*":' "$current_output" 2>/dev/null | \
+                    grep -E '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)' 2>/dev/null | \
+                    sort | uniq)
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 890720a and 63590b3.

📒 Files selected for processing (2)
  • lib/response_analyzer.sh
  • tests/test_stuck_loop_detection.sh
🧰 Additional context used
📓 Path-based instructions (2)
**/*.sh

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.sh: Write unit tests for bash script functions using test.sh
Update inline comments in bash scripts when implementation changes; keep all comments current and remove outdated comments immediately

Files:

  • tests/test_stuck_loop_detection.sh
  • lib/response_analyzer.sh
**/test*.sh

📄 CodeRabbit inference engine (CLAUDE.md)

**/test*.sh: Write integration tests for Ralph loop behavior
Write end-to-end tests for full development cycles
Include comments explaining test strategy for complex test scenarios

Files:

  • tests/test_stuck_loop_detection.sh
🧠 Learnings (4)
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write integration tests for Ralph loop behavior

Applied to files:

  • tests/test_stuck_loop_detection.sh
  • lib/response_analyzer.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/*.sh : Write unit tests for bash script functions using test.sh

Applied to files:

  • tests/test_stuck_loop_detection.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write end-to-end tests for full development cycles

Applied to files:

  • tests/test_stuck_loop_detection.sh
📚 Learning: 2025-12-31T19:30:50.707Z
Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Include comments explaining test strategy for complex test scenarios

Applied to files:

  • tests/test_stuck_loop_detection.sh
🧬 Code graph analysis (1)
tests/test_stuck_loop_detection.sh (1)
lib/response_analyzer.sh (1)
  • detect_stuck_loop (254-300)
🪛 Shellcheck (0.11.0)
lib/response_analyzer.sh

[warning] 268-268: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (4)
lib/response_analyzer.sh (2)

265-296: Improved stuck-loop detection logic is robust.

The refactored logic correctly implements the requirement that ALL current error lines must appear in ALL recent history files to detect a stuck loop. The nested loop structure (Lines 279-293) is more complex than the previous implementation but appropriately handles multi-line error scenarios and prevents false positives from partial matches.

The use of grep -qF for literal fixed-string matching (Line 283) is the right choice to avoid unintended regex interpretation of error messages containing special characters.


91-97: Patterns are correctly aligned with ralph_loop.sh.

The Stage 1 filtering (grep -v '"[^"]*error[^"]*":') and Stage 2 error detection regex (^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL) are identical across both files, confirming consistent behavior as claimed.

tests/test_stuck_loop_detection.sh (2)

6-14: Excellent test strategy documentation.

The test strategy comment clearly explains what the detect_stuck_loop function does and what aspects are being validated. This follows the coding guideline to "include comments explaining test strategy for complex test scenarios."

As per coding guidelines, test strategy documentation is required for complex scenarios in test files.


1-241: Comprehensive test coverage validates two-stage filtering behavior.

The test suite provides excellent coverage of the detect_stuck_loop function's behavior:

  • False-positive prevention: Tests 2, 5, 7 validate that JSON fields, type annotations, and non-error content don't trigger stuck detection
  • True-positive detection: Tests 3, 6, 8 validate that actual repeated errors are correctly detected
  • Multi-error logic: Tests 8-9 validate the requirement that ALL current error lines must appear in ALL history files for stuck detection
  • Edge cases: Tests 1, 4 cover no-history and differing-errors scenarios

The test structure with clear naming, colored output, and per-test result tracking makes it easy to diagnose failures.

Based on learnings, integration tests for Ralph loop behavior should be comprehensive.

@frankbria frankbria merged commit 8158e38 into main Dec 31, 2025
3 checks passed
@frankbria frankbria deleted the fix/circuit-breaker-false-positives branch December 31, 2025 21:18
frankbria pushed a commit that referenced this pull request Dec 31, 2025
Updated both documentation files to reflect recent enhancements and fixes:

CLAUDE.md Changes:
- Added comprehensive Core Architecture section detailing main scripts and lib/ components
- Documented lib/circuit_breaker.sh and lib/response_analyzer.sh modular architecture
- Added detailed Exit Conditions and Thresholds section with circuit breaker thresholds
- Documented advanced two-stage error detection process to eliminate false positives
- Documented multi-line error matching for accurate stuck loop detection
- Added Recent Improvements section highlighting v0.9.0 circuit breaker enhancements
- Updated Global Installation section to include lib/ directory and ralph-import command
- Test coverage details: 13 error detection + 9 stuck loop tests

README.md Changes:
- Updated "What's Working Now" section with circuit breaker enhancements
- Added "Recent Improvements" section highlighting v0.9.0 updates
- Updated test count from 75 to 97 tests (75 core + 13 error detection + 9 stuck loop)
- Enhanced circuit breaker description with two-stage filtering details
- Added circuit breaker thresholds to Exit Thresholds section
- Updated test commands to include error detection and stuck loop test scripts
- Updated current test status with specialized test file counts

Key improvements documented:
- Multi-line error matching fix for detect_stuck_loop function
- JSON field false positive elimination (e.g., "is_error": false)
- Two-stage error filtering for accurate error detection
- Installation fix for lib/ directory components
- 22 new tests added for circuit breaker functionality

These updates ensure documentation accurately reflects the current state of the
codebase following PR #6 (circuit-breaker false positives fix) and PR #4
(installation lib/ directory fix).
lida-spaceport pushed a commit to lida-spaceport/ralph-claude-code that referenced this pull request Jan 28, 2026
- Make --dangerously-skip-permissions conditional: only added when
  CLAUDE_ALLOWED_TOOLS is empty (default). When tools are specified
  via --allowed-tools, use --allowedTools with granular permissions
  instead. Previously --dangerously-skip-permissions was unconditional,
  making --allowedTools dead code.
- Change CLAUDE_ALLOWED_TOOLS default from "Write,Bash(git *),Read"
  to "" so default behavior is skip-all-permissions.
- Add prd.json completion check as exit condition frankbria#6 in
  should_exit_gracefully(). Deterministically checks if all
  userStories have passes != false, preventing infinite loops
  when heuristic signals are absent.
- Rename output_file to output_file_path in wait_for_daily_reset()
  for clarity — the variable holds a path read from a tracking file,
  not an output file handle.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
wtthornton referenced this pull request in wtthornton/ralph-claude-code Mar 22, 2026
…ove pytest-timeout dep

Three fixes for errors observed during Ralph Loop #6 against TheStudio:

1. XPLAT-2b: ralph_validate_hooks() now checks the target project's
   .claude/settings.json for bare 'powershell' hook commands and auto-patches
   them to 'powershell.exe' in WSL environments (eliminates SessionStart errors)

2. ralph-tester: Fixed file path references to use .ralph/AGENT.md and
   .ralph/fix_plan.md instead of bare names (eliminates worktree file-not-found)

3. ralph-tester: Changed deadline timeout from pytest --timeout=30 (requires
   pytest-timeout plugin) to shell timeout command (works everywhere)

Also includes: FAILSPEC-3 killswitch, FAILSPEC-4 audit logging, ADAPTIVE-1
timeout latency recording from prior working tree.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

is_error: false, triggers "error" circuit breaker incorrectly

1 participant