Fix circuit breaker false positives from JSON field names by frankbria · Pull Request #6 · frankbria/ralph-claude-code

frankbria · 2025-12-31T20:25:53Z

Summary

Eliminates circuit breaker false positives caused by naive error pattern matching that treated JSON field names (like "is_error": false) as actual errors. This was causing the circuit breaker to open prematurely after 5 consecutive loops with JSON output, halting valid development work.

Root Cause

The original error detection used grep -q "error|Error|ERROR" which matched:

✅ Actual error messages: Error: Failed to compile
❌ JSON field names: "is_error": false, "error": null, "error_count": 0
❌ Type annotations: error: Error in TypeScript code
❌ Documentation: Any mention of the word "error"

This triggered the circuit breaker's consecutive_same_error counter in lib/circuit_breaker.sh, which opens the circuit after 5 consecutive false positives.

Changes

1. `ralph_loop.sh` (lines 390-409)

Two-stage filtering: First removes JSON patterns, then searches for actual errors
Context-specific patterns: ^Error:, ]: error, Exception, Fatal
Debug logging: Shows matched error lines when VERBOSE_PROGRESS=true

2. `lib/response_analyzer.sh` (lines 91-97)

Applied same two-stage filtering to error counting
Prevents JSON fields from contributing to stuck loop detection

3. `tests/test_error_detection.sh` (new file)

Comprehensive test suite with 12 scenarios
Validates JSON false positives are excluded
Confirms real errors are still detected
Tests edge cases: git diffs, documentation, type annotations

Test Results

========================================
Test Summary
========================================
Passed: 12
Failed: 0
========================================
All tests passed!

Test Scenarios Covered

Scenario	Input	Expected	Result
JSON with `"is_error": false`	JSON field only	No error	✅ Pass
JSON with `"error": null`	JSON field only	No error	✅ Pass
Actual error message	`Error: Failed to compile`	Error detected	✅ Pass
Exception in log	`Exception: NullPointerException`	Error detected	✅ Pass
Mixed JSON + error	JSON + real error	Error detected	✅ Pass
Normal output	No errors	No error	✅ Pass
Log format error	`]: error: undefined reference`	Error detected	✅ Pass
Multiple JSON error fields	Many JSON fields	No error	✅ Pass
Uppercase ERROR	`ERROR: Database failed`	Error detected	✅ Pass
Cannot/unable context	`unable to access`	Error detected	✅ Pass
Documentation	Docs mentioning "error"	No error	✅ Pass
TypeScript code diff	`error: Error` type	No error	✅ Pass

Impact

Before: Circuit breaker opened after 5 loops with JSON output, halting development
After: Circuit breaker only opens on actual repeated errors, JSON output ignored

This fix maintains the protective functionality of the circuit breaker while eliminating false positives from structured data formats.

Validation

Run the test suite:

./tests/test_error_detection.sh

Test with actual Ralph loop:

ralph --monitor --calls 10

Enable debug logging:

VERBOSE_PROGRESS=true ralph --monitor

Summary by CodeRabbit

Bug Fixes
- Reduced false positives by introducing two-stage filtering to ignore JSON-field noise and by requiring the same error lines to appear across recent outputs before flagging a stuck condition.
New Features
- Optional verbose progress logging that prints detected error lines when enabled.
Tests
- Added comprehensive test suites covering varied JSON/log scenarios and stuck-loop detection across history.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…etection Fixes circuit breaker opening prematurely due to naive error pattern matching that treated JSON field names like "is_error": false as actual errors. Changes: - ralph_loop.sh: Implement two-stage error detection with JSON filtering - lib/response_analyzer.sh: Apply same filtering to error counting - tests/test_error_detection.sh: Add comprehensive test suite (12 scenarios) Error detection now: - Filters out JSON field patterns before searching for errors - Uses context-specific patterns (^Error:, ]: error, Exception, Fatal) - Avoids type annotations (error: Error) and code identifiers - Includes debug logging when VERBOSE_PROGRESS=true Test coverage validates: ✓ JSON fields don't trigger false positives ✓ Real error messages are correctly detected ✓ Mixed content handled properly ✓ Code diffs and documentation excluded This prevents the consecutive_same_error counter from incrementing on false positives, eliminating unnecessary circuit breaker trips.

coderabbitai · 2025-12-31T20:26:01Z

Walkthrough

Two-stage error detection replaces single grep: JSON-like fields are filtered out first, then real error phrases are matched. detect_stuck_loop now requires all current error lines to appear across recent history files. Verbose progress logging added; two new test suites validate behavior.

Changes

Cohort / File(s)	Summary
Response analyzer `lib/response_analyzer.sh`	Removed global `STUCK_INDICATORS`. Replaced single-pass grep with two-stage filtering: first exclude JSON field/name patterns (e.g., `"is_error": false`, `"error": false`), then match explicit error indicators. `detect_stuck_loop` now requires every current error line to appear in all recent history files (fixed-string matching, tracks `all_files_match`).
Loop script `ralph_loop.sh`	Adopted the same two-stage error-detection pipeline to avoid JSON false positives. Added optional verbose debug output when `VERBOSE_PROGRESS="true"` (prints up to three detected error lines). Maintains prior warning and `has_errors` behavior when errors found.
Error detection tests `tests/test_error_detection.sh`	New test suite (13 scenarios) exercising two-stage filtering: JSON-only fields, actual errors, exceptions, mixed content, and benign text. Creates temp workspace, runs cases with helpers, colorized results, summarizes and exits non‑zero on failures.
Stuck-loop detection tests `tests/test_stuck_loop_detection.sh`	New test suite (9 scenarios) sourcing `lib/response_analyzer.sh` and verifying `detect_stuck_loop` behavior with history files: JSON-only outputs, repeated real errors, differing history errors, mixed cases, and multi-error conditions. Uses temp history files, colorized reporting, and appropriate exit codes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I nibble through logs with whiskers keen, 🐇
JSON ghosts I skip, real faults are seen,
Two filters I run and three lines I show,
I hop, I report, then softly go—
Small rabbit triumphs in each error's glow.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: fixing circuit breaker false positives caused by JSON field name matching. It directly corresponds to the primary objective of the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/circuit-breaker-false-positives

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

ralph_loop.sh (1)

390-409: LGTM with minor optimization suggestion.

The two-stage error detection implementation correctly filters JSON field patterns before scanning for actual errors, and the pattern is consistent with the test suite. The verbose debug logging is properly gated behind VERBOSE_PROGRESS and limits output to 3 lines.

💡 Optional: Avoid duplicate grep in verbose mode

Currently, verbose mode runs the same grep twice (lines 394-395 for detection, lines 401-402 for display). Consider capturing matching lines once:

-            # Two-stage error detection to avoid JSON field false positives
-            # Stage 1: Filter out JSON field patterns like "is_error": false
-            # Stage 2: Look for actual error messages in specific contexts
-            # Avoid type annotations like "error: Error" by requiring lowercase after ": error"
-            if grep -v '"[^"]*error[^"]*":' "$output_file" 2>/dev/null | \
-               grep -qE '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)'; then
-                has_errors="true"
-
-                # Debug logging: show what triggered error detection
-                if [[ "$VERBOSE_PROGRESS" == "true" ]]; then
-                    log_status "DEBUG" "Error patterns found:"
-                    grep -v '"[^"]*error[^"]*":' "$output_file" 2>/dev/null | \
-                        grep -nE '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)' | \
-                        head -3 | while IFS= read -r line; do
-                        log_status "DEBUG" "  $line"
-                    done
-                fi
-
+            # Two-stage error detection to avoid JSON field false positives
+            # Stage 1: Filter out JSON field patterns like "is_error": false
+            # Stage 2: Look for actual error messages in specific contexts
+            local error_matches=$(grep -v '"[^"]*error[^"]*":' "$output_file" 2>/dev/null | \
+                grep -nE '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)' 2>/dev/null)
+            
+            if [[ -n "$error_matches" ]]; then
+                has_errors="true"
+
+                # Debug logging: show what triggered error detection
+                if [[ "$VERBOSE_PROGRESS" == "true" ]]; then
+                    log_status "DEBUG" "Error patterns found:"
+                    echo "$error_matches" | head -3 | while IFS= read -r line; do
+                        log_status "DEBUG" "  $line"
+                    done
+                fi
+
                 log_status "WARN" "Errors detected in output, check: $output_file"
             fi

tests/test_error_detection.sh (1)

1-46: Missing test strategy comments.

Per coding guidelines, complex test scenarios should include comments explaining the test strategy. Consider adding a header comment that explains:

The two-stage filtering approach being validated

Why specific test cases were chosen (JSON false positives, actual errors, edge cases)

How the test pattern relates to the implementations in ralph_loop.sh and lib/response_analyzer.sh

Based on coding guidelines: "Include comments explaining test strategy for complex test scenarios"
📝 Suggested test strategy comment
#!/bin/bash
# Test script for error detection fix
# Validates that JSON field names don't trigger false positives
#
# TEST STRATEGY:
# This suite validates the two-stage error detection implemented in ralph_loop.sh
# to prevent circuit breaker false positives from JSON output.
#
# Stage 1: Filter out JSON field patterns (e.g., "is_error": false, "error": null)
# Stage 2: Detect actual errors using context-specific patterns (Error:, Exception, Fatal)
#
# Coverage:
# - JSON fields with "error" keyword (tests 1, 2, 8)
# - Actual error messages with context (tests 3, 4, 7, 9)
# - Mixed JSON + real errors (test 5)
# - Benign content (tests 6, 11)
# - Code/diffs with error keywords (test 12)
# - Edge cases (test 10)
#
# Note: Currently validates ralph_loop.sh implementation only.
# TODO: Add separate tests for lib/response_analyzer.sh which uses different patterns.

set -e

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e370c1e and 8fc5375.

📒 Files selected for processing (3)

lib/response_analyzer.sh
ralph_loop.sh
tests/test_error_detection.sh

🧰 Additional context used

📓 Path-based instructions (2)

**/*.sh

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.sh: Write unit tests for bash script functions using test.sh
Update inline comments in bash scripts when implementation changes; keep all comments current and remove outdated comments immediately

Files:

lib/response_analyzer.sh
tests/test_error_detection.sh
ralph_loop.sh

**/test*.sh

📄 CodeRabbit inference engine (CLAUDE.md)

**/test*.sh: Write integration tests for Ralph loop behavior
Write end-to-end tests for full development cycles
Include comments explaining test strategy for complex test scenarios

Files:

tests/test_error_detection.sh

🧠 Learnings (7)

📚 Learning: 2025-12-31T19:30:50.707Z

Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/*.sh : Write unit tests for bash script functions using test.sh

Applied to files:

tests/test_error_detection.sh

📚 Learning: 2025-12-31T19:30:50.707Z

Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Include comments explaining test strategy for complex test scenarios

Applied to files:

tests/test_error_detection.sh

📚 Learning: 2025-12-31T19:30:50.707Z

Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write end-to-end tests for full development cycles

Applied to files:

tests/test_error_detection.sh

📚 Learning: 2025-12-31T19:30:50.707Z

Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write integration tests for Ralph loop behavior

Applied to files:

tests/test_error_detection.sh
ralph_loop.sh

📚 Learning: 2025-12-31T19:30:50.708Z

Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.708Z
Learning: Test Ralph loop with new features before marking features complete using 'ralph --monitor --calls 5' or './test.sh --coverage'

Applied to files:

tests/test_error_detection.sh
ralph_loop.sh

📚 Learning: 2025-12-31T19:30:50.707Z

Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to CLAUDE.md : Keep CLAUDE.md synchronized with codebase - add new commands to 'Key Commands' section, update 'Exit Conditions and Thresholds' when logic changes, keep installation instructions accurate and tested, document new Ralph loop behaviors

Applied to files:

ralph_loop.sh

📚 Learning: 2025-12-31T19:30:50.707Z

Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to PROMPT.md : Update PROMPT.md if Ralph's autonomous behavior needs modification

Applied to files:

ralph_loop.sh

Aligns error detection patterns across all implementations and improves test coverage based on CodeRabbit's critical and major findings. Changes: 1. CRITICAL: Align lib/response_analyzer.sh pattern with ralph_loop.sh - Changed Stage 1 filter from '"[^"]*$error\|failed$"[^"]*":' to '"[^"]*error[^"]*":' - Removed bare 'cannot' and 'unable to' from Stage 2 (prevent false positives in prose) - Both files now use identical patterns for consistency 2. MAJOR: Improved test coverage - Renamed test 10 from "Cannot/unable in error context" to "Error prefix with descriptive message" - Added test 10a to validate bare "cannot/unable" DON'T trigger false positives - Now testing 13 scenarios (was 12) 3. MINOR: Added comprehensive test strategy documentation - Header comments explain two-stage filtering approach - Documents pattern consistency requirement - Lists all 13 test scenarios and their purpose Test results: ✓ All 13 tests passing ✓ Pattern consistency validated across ralph_loop.sh and lib/response_analyzer.sh ✓ False positive scenarios properly excluded Addresses CodeRabbit review comments: - r2655862688 (Critical pattern inconsistency) - r2655862689 (Major test coverage gap) - r2655862690 (Minor misleading test name)

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

lib/response_analyzer.sh (3)
267-267: Critical: Old naive error pattern still present.

Line 267 uses grep -i "error\|failed" which will match JSON field names like "is_error": false or "error": null, causing the same false positives this PR aims to eliminate. The detect_stuck_loop function must apply the same two-stage filtering as analyze_response to maintain consistency.
🔎 Proposed fix to apply two-stage filtering
-    # Extract key errors from current output
-    local current_errors=$(grep -i "error\|failed" "$current_output" 2>/dev/null | sort | uniq)
+    # Extract key errors from current output using two-stage filtering
+    # Stage 1: Filter out JSON field patterns, Stage 2: Match actual error messages
+    local current_errors=$(grep -v '"[^"]*error[^"]*":' "$current_output" 2>/dev/null | \
+                           grep -E '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)' \
+                           2>/dev/null | sort | uniq)
18-18: Remove unused configuration array.

The STUCK_INDICATORS array is defined but never referenced in the code. The actual stuck detection at line 103 uses error_count > 5 rather than these patterns. Per coding guidelines, dead code should be removed to reduce maintenance burden.
🔎 Proposed fix to remove unused array
-STUCK_INDICATORS=("error" "failed" "cannot" "unable to" "blocked")
254-286: Add unit tests for detect_stuck_loop function to the test suite.

The test file (tests/test_error_detection.sh) validates two-stage error filtering for analyze_response across 13 scenarios, but does not test detect_stuck_loop. Line 268 uses simple grep -i "error\|failed" without the two-stage filtering that analyze_response applies (grep -v for JSON fields, then grep for actual error patterns). Tests should verify detect_stuck_loop with JSON-containing input to ensure it does not trigger false positives like "is_error": false.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8fc5375 and 90fb558.

📒 Files selected for processing (2)

lib/response_analyzer.sh
tests/test_error_detection.sh

🚧 Files skipped from review as they are similar to previous changes (1)

tests/test_error_detection.sh

🧰 Additional context used

📓 Path-based instructions (1)

**/*.sh

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.sh: Write unit tests for bash script functions using test.sh
Update inline comments in bash scripts when implementation changes; keep all comments current and remove outdated comments immediately

Files:

lib/response_analyzer.sh

🧠 Learnings (1)

📚 Learning: 2025-12-31T19:30:50.707Z

Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write integration tests for Ralph loop behavior

Applied to files:

lib/response_analyzer.sh

🔇 Additional comments (1)

lib/response_analyzer.sh (1)

92-98: Critical issue resolved: patterns now aligned correctly.

The previously flagged critical inconsistency has been addressed:

Stage 1 now uses the simplified '"[^"]*error[^"]*":' pattern

Stage 2 has removed the bare cannot and unable to patterns that caused false positives

Comments accurately document the two-stage approach and alignment goal

This implementation now correctly filters JSON field names while detecting real error messages.

Addresses additional CodeRabbit findings in lib/response_analyzer.sh: CRITICAL (line 267): - Fixed detect_stuck_loop() to use two-stage filtering - Was using naive grep -i "error\|failed" pattern - Now filters JSON fields before extracting errors - Pattern aligned with analyze_response() for consistency DEAD CODE (line 18): - Removed unused STUCK_INDICATORS array - Array was defined but never referenced in code - Reduces maintenance burden per coding guidelines TEST COVERAGE: - Added comprehensive test suite for detect_stuck_loop() - New file: tests/test_stuck_loop_detection.sh - 7 test scenarios validating: * JSON fields don't trigger false stuck detection * Actual repeated errors are correctly detected * Type annotations are properly excluded * Function returns appropriate exit codes Test results: ✓ Error detection tests: 13/13 passing ✓ Stuck loop tests: 7/7 passing ✓ Total: 20/20 tests passing This ensures both error detection functions (analyze_response and detect_stuck_loop) use identical filtering logic, preventing circuit breaker false positives across all code paths. Addresses CodeRabbit review comments: - Outside diff range comment: line 267 (Critical) - Outside diff range comment: line 18 (Dead code) - Outside diff range comment: lines 254-286 (Test coverage)

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/response_analyzer.sh (1)

268-283: Multi-line error matching bug in stuck detection.

When current_errors contains multiple distinct error messages (multiple lines after sort | uniq), the grep -q "$current_errors" on Line 279 will only match the first line. This means stuck loops with multiple different errors won't be detected correctly.

The test suite doesn't cover this scenario—all tests use outputs that extract to a single error line.

Example scenario that would fail:

# Current output has two distinct errors
Error: Failed to compile src/main.ts
Fatal: Database connection lost

# Both errors appear in all 3 historical files
# But grep -q would only check if the first error appears

🔎 Proposed fix to check each error individually

-    # Check if same errors appear in all recent outputs
-    local stuck_count=0
-    while IFS= read -r output_file; do
-        if grep -q "$current_errors" "$output_file" 2>/dev/null; then
-            ((stuck_count++))
-        fi
-    done <<< "$recent_outputs"
-
-    if [[ $stuck_count -ge 3 ]]; then
-        return 0  # Stuck on same error
-    else
-        return 1  # Making progress or different errors
-    fi
+    # Check if same errors appear in all recent outputs
+    local all_files_match=true
+    while IFS= read -r output_file; do
+        local file_matches_all=true
+        while IFS= read -r error_line; do
+            if ! grep -qF "$error_line" "$output_file" 2>/dev/null; then
+                file_matches_all=false
+                break
+            fi
+        done <<< "$current_errors"
+        
+        if [[ "$file_matches_all" != "true" ]]; then
+            all_files_match=false
+            break
+        fi
+    done <<< "$recent_outputs"
+
+    if [[ "$all_files_match" == "true" ]]; then
+        return 0  # Stuck on same error(s)
+    else
+        return 1  # Making progress or different errors
+    fi

Note: Changed to grep -qF for literal fixed-string matching instead of regex pattern matching.

🧹 Nitpick comments (1)

tests/test_stuck_loop_detection.sh (1)
101-116: Consider adding a test case for multiple distinct errors.

While the current test suite is comprehensive for single-error scenarios, there's no test case where the output contains multiple distinct errors that all match the Stage 2 error patterns.

Suggested test scenario:
# Current output with two distinct errors
Error: Failed to compile src/main.ts
Fatal: Database connection lost

# Both errors repeated in all 3 history files
# Should trigger stuck detection (exit code 0)
This would validate that the stuck loop detection correctly handles outputs with multiple concurrent error types. Currently, Test 3 appears to have two lines but only the "Error:" line matches the extraction pattern.
🔎 Proposed additional test case

Add after Test 3 (Line 116):
# Test 3b: Multiple distinct errors repeated should trigger stuck detection
cat > "$TEST_DIR/current_output.log" << 'EOF'
Build started
Error: Failed to compile src/main.ts
Fatal: Database connection lost
EOF
# Create 3 history files with both errors
for i in 1 2 3; do
    sleep 0.1
    cat > "$HISTORY_DIR/claude_output_00${i}.log" << 'EOF'
Build started
Error: Failed to compile src/main.ts
Fatal: Database connection lost
EOF
done
run_test "Multiple distinct errors repeated trigger stuck detection" 0
Based on learnings: Write integration tests for Ralph loop behavior.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90fb558 and 890720a.

📒 Files selected for processing (2)

lib/response_analyzer.sh
tests/test_stuck_loop_detection.sh

🧰 Additional context used

📓 Path-based instructions (2)

**/*.sh

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.sh: Write unit tests for bash script functions using test.sh
Update inline comments in bash scripts when implementation changes; keep all comments current and remove outdated comments immediately

Files:

tests/test_stuck_loop_detection.sh
lib/response_analyzer.sh

**/test*.sh

📄 CodeRabbit inference engine (CLAUDE.md)

**/test*.sh: Write integration tests for Ralph loop behavior
Write end-to-end tests for full development cycles
Include comments explaining test strategy for complex test scenarios

Files:

tests/test_stuck_loop_detection.sh

🧠 Learnings (4)

📚 Learning: 2025-12-31T19:30:50.707Z

Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write integration tests for Ralph loop behavior

Applied to files:

tests/test_stuck_loop_detection.sh
lib/response_analyzer.sh

📚 Learning: 2025-12-31T19:30:50.707Z

Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write end-to-end tests for full development cycles

Applied to files:

tests/test_stuck_loop_detection.sh

📚 Learning: 2025-12-31T19:30:50.707Z

Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/*.sh : Write unit tests for bash script functions using test.sh

Applied to files:

tests/test_stuck_loop_detection.sh

📚 Learning: 2025-12-31T19:30:50.707Z

Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Include comments explaining test strategy for complex test scenarios

Applied to files:

tests/test_stuck_loop_detection.sh

🧬 Code graph analysis (1)

tests/test_stuck_loop_detection.sh (1)

lib/response_analyzer.sh (1)

detect_stuck_loop (254-289)

🪛 Shellcheck (0.11.0)

lib/response_analyzer.sh

[warning] 268-268: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (2)

lib/response_analyzer.sh (1)

91-97: LGTM! Pattern consistency resolved.

The two-stage filtering approach correctly addresses the JSON false positive issue. Stage 1 filters out JSON field patterns, and Stage 2 matches actual error messages in specific contexts. The pattern is now aligned across functions as noted in the comments.

Based on past review comments, this addresses the pattern inconsistency that was flagged previously.

tests/test_stuck_loop_detection.sh (1)

1-64: Excellent test structure and documentation.

The test suite demonstrates best practices:

Clear test strategy documented in the header (lines 6-13)

Proper cleanup with trap on EXIT

Well-organized run_test helper with colored output

Comprehensive scenario coverage for JSON false positives

Based on coding guidelines: Test strategy for complex scenarios is well explained. ✓

Addresses CodeRabbit outside diff range comment on lines 268-283: CRITICAL BUG: The detect_stuck_loop function had a multi-line string handling bug where only the first error line was checked against historical outputs when multiple distinct errors were present. Problem: grep -q "$current_errors" file.log # When $current_errors has multiple lines, grep only matches first line Impact: Stuck loops with multiple recurring errors would not be detected correctly, potentially allowing Ralph to continue running despite being genuinely stuck. Fix: Changed from simple grep to nested loop checking: - For each historical output file - For each error line in current output - ALL error lines must appear in that file - Only returns "stuck" if ALL files contain ALL current errors Used grep -qF for literal fixed-string matching (not regex) to avoid edge cases with special characters in error messages. Test Coverage: Added 2 new test scenarios (7 → 9 total tests): Test 8: Multiple distinct errors where ALL repeat across history Current: Error: Build failed + Fatal: DB lost + Exception: NPE History: All 3 files contain all 3 errors Expected: Stuck detected ✅ Test 9: Multiple errors where only some repeat Current: Error: Build failed + Fatal: DB lost History: Only first error appears consistently Expected: Not stuck ✅ All existing tests continue to pass, validating backward compatibility. Test results: ✓ Error detection tests: 13/13 passing ✓ Stuck loop tests: 9/9 passing (was 7/7) ✓ Total: 22/22 tests passing This ensures detect_stuck_loop correctly handles the real-world scenario where Ralph gets stuck on multiple simultaneous recurring errors. Addresses CodeRabbit review comment: - Outside diff range (lines 268-283): Multi-line error matching bug

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

lib/response_analyzer.sh (1)
268-268: Optional: Address Shellcheck warning about return value masking.

Shellcheck warns that combining declaration and assignment can mask command return values. In this case, the masking is intentional since the code relies on checking whether current_errors is empty (Line 272) rather than examining exit codes. The current implementation works correctly.

If you prefer to satisfy Shellcheck for pedantic correctness:
🔎 Optional refactor to separate declaration and assignment
-    local current_errors=$(grep -v '"[^"]*error[^"]*":' "$current_output" 2>/dev/null | \
-                          grep -E '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)' 2>/dev/null | \
-                          sort | uniq)
+    local current_errors
+    current_errors=$(grep -v '"[^"]*error[^"]*":' "$current_output" 2>/dev/null | \
+                    grep -E '(^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL)' 2>/dev/null | \
+                    sort | uniq)

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 890720a and 63590b3.

📒 Files selected for processing (2)

lib/response_analyzer.sh
tests/test_stuck_loop_detection.sh

🧰 Additional context used

📓 Path-based instructions (2)

**/*.sh

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.sh: Write unit tests for bash script functions using test.sh
Update inline comments in bash scripts when implementation changes; keep all comments current and remove outdated comments immediately

Files:

tests/test_stuck_loop_detection.sh
lib/response_analyzer.sh

**/test*.sh

📄 CodeRabbit inference engine (CLAUDE.md)

**/test*.sh: Write integration tests for Ralph loop behavior
Write end-to-end tests for full development cycles
Include comments explaining test strategy for complex test scenarios

Files:

tests/test_stuck_loop_detection.sh

🧠 Learnings (4)

📚 Learning: 2025-12-31T19:30:50.707Z

Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write integration tests for Ralph loop behavior

Applied to files:

tests/test_stuck_loop_detection.sh
lib/response_analyzer.sh

📚 Learning: 2025-12-31T19:30:50.707Z

Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/*.sh : Write unit tests for bash script functions using test.sh

Applied to files:

tests/test_stuck_loop_detection.sh

📚 Learning: 2025-12-31T19:30:50.707Z

Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Write end-to-end tests for full development cycles

Applied to files:

tests/test_stuck_loop_detection.sh

📚 Learning: 2025-12-31T19:30:50.707Z

Learnt from: CR
Repo: frankbria/ralph-claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T19:30:50.707Z
Learning: Applies to **/test*.sh : Include comments explaining test strategy for complex test scenarios

Applied to files:

tests/test_stuck_loop_detection.sh

🧬 Code graph analysis (1)

tests/test_stuck_loop_detection.sh (1)

lib/response_analyzer.sh (1)

detect_stuck_loop (254-300)

🪛 Shellcheck (0.11.0)

lib/response_analyzer.sh

[warning] 268-268: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (4)

lib/response_analyzer.sh (2)

265-296: Improved stuck-loop detection logic is robust.

The refactored logic correctly implements the requirement that ALL current error lines must appear in ALL recent history files to detect a stuck loop. The nested loop structure (Lines 279-293) is more complex than the previous implementation but appropriately handles multi-line error scenarios and prevents false positives from partial matches.

The use of grep -qF for literal fixed-string matching (Line 283) is the right choice to avoid unintended regex interpretation of error messages containing special characters.

91-97: Patterns are correctly aligned with ralph_loop.sh.

The Stage 1 filtering (grep -v '"[^"]*error[^"]*":') and Stage 2 error detection regex (^Error:|^ERROR:|^error:|\]: error|Link: error|Error occurred|failed with error|[Ee]xception|Fatal|FATAL) are identical across both files, confirming consistent behavior as claimed.

tests/test_stuck_loop_detection.sh (2)

6-14: Excellent test strategy documentation.

The test strategy comment clearly explains what the detect_stuck_loop function does and what aspects are being validated. This follows the coding guideline to "include comments explaining test strategy for complex test scenarios."

As per coding guidelines, test strategy documentation is required for complex scenarios in test files.

1-241: Comprehensive test coverage validates two-stage filtering behavior.

The test suite provides excellent coverage of the detect_stuck_loop function's behavior:

False-positive prevention: Tests 2, 5, 7 validate that JSON fields, type annotations, and non-error content don't trigger stuck detection

True-positive detection: Tests 3, 6, 8 validate that actual repeated errors are correctly detected

Multi-error logic: Tests 8-9 validate the requirement that ALL current error lines must appear in ALL history files for stuck detection

Edge cases: Tests 1, 4 cover no-history and differing-errors scenarios

The test structure with clear naming, colored output, and per-test result tracking makes it easy to diagnose failures.

Based on learnings, integration tests for Ralph loop behavior should be comprehensive.

Updated both documentation files to reflect recent enhancements and fixes: CLAUDE.md Changes: - Added comprehensive Core Architecture section detailing main scripts and lib/ components - Documented lib/circuit_breaker.sh and lib/response_analyzer.sh modular architecture - Added detailed Exit Conditions and Thresholds section with circuit breaker thresholds - Documented advanced two-stage error detection process to eliminate false positives - Documented multi-line error matching for accurate stuck loop detection - Added Recent Improvements section highlighting v0.9.0 circuit breaker enhancements - Updated Global Installation section to include lib/ directory and ralph-import command - Test coverage details: 13 error detection + 9 stuck loop tests README.md Changes: - Updated "What's Working Now" section with circuit breaker enhancements - Added "Recent Improvements" section highlighting v0.9.0 updates - Updated test count from 75 to 97 tests (75 core + 13 error detection + 9 stuck loop) - Enhanced circuit breaker description with two-stage filtering details - Added circuit breaker thresholds to Exit Thresholds section - Updated test commands to include error detection and stuck loop test scripts - Updated current test status with specialized test file counts Key improvements documented: - Multi-line error matching fix for detect_stuck_loop function - JSON field false positive elimination (e.g., "is_error": false) - Two-stage error filtering for accurate error detection - Installation fix for lib/ directory components - 22 new tests added for circuit breaker functionality These updates ensure documentation accurately reflects the current state of the codebase following PR #6 (circuit-breaker false positives fix) and PR #4 (installation lib/ directory fix).

- Make --dangerously-skip-permissions conditional: only added when CLAUDE_ALLOWED_TOOLS is empty (default). When tools are specified via --allowed-tools, use --allowedTools with granular permissions instead. Previously --dangerously-skip-permissions was unconditional, making --allowedTools dead code. - Change CLAUDE_ALLOWED_TOOLS default from "Write,Bash(git *),Read" to "" so default behavior is skip-all-permissions. - Add prd.json completion check as exit condition frankbria#6 in should_exit_gracefully(). Deterministically checks if all userStories have passes != false, preventing infinite loops when heuristic signals are absent. - Rename output_file to output_file_path in wait_for_daily_reset() for clarity — the variable holds a path read from a tracking file, not an output file handle. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ove pytest-timeout dep Three fixes for errors observed during Ralph Loop #6 against TheStudio: 1. XPLAT-2b: ralph_validate_hooks() now checks the target project's .claude/settings.json for bare 'powershell' hook commands and auto-patches them to 'powershell.exe' in WSL environments (eliminates SessionStart errors) 2. ralph-tester: Fixed file path references to use .ralph/AGENT.md and .ralph/fix_plan.md instead of bare names (eliminates worktree file-not-found) 3. ralph-tester: Changed deadline timeout from pytest --timeout=30 (requires pytest-timeout plugin) to shell timeout command (works everywhere) Also includes: FAILSPEC-3 killswitch, FAILSPEC-4 audit logging, ADAPTIVE-1 timeout latency recording from prior working tree. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

frankbria linked an issue Dec 31, 2025 that may be closed by this pull request

is_error: false, triggers "error" circuit breaker incorrectly #2

Closed

coderabbitai Bot reviewed Dec 31, 2025

View reviewed changes

Comment thread lib/response_analyzer.sh

Comment thread tests/test_error_detection.sh

Comment thread tests/test_error_detection.sh Outdated

coderabbitai Bot reviewed Dec 31, 2025

View reviewed changes

frankbria merged commit 8158e38 into main Dec 31, 2025
3 checks passed

frankbria deleted the fix/circuit-breaker-false-positives branch December 31, 2025 21:18

frankbria mentioned this pull request Dec 31, 2025

docs: update CLAUDE.md and README.md with recent improvements #8

Merged

coderabbitai Bot mentioned this pull request Jan 12, 2026

Bug: completion_indicators exit check ignores Claude's explicit EXIT_SIGNAL: false #88

Closed

This was referenced Jan 21, 2026

parse_json_response fails with Claude Code CLI JSON array output format #112

Closed

Bug: EXIT_SIGNAL not detected in JSON output format, causing infinite loops #113

Closed

github-actions Bot mentioned this pull request Jan 24, 2026

feat(enable): add ralph-enable wizard for existing projects (v0.11.0) #124

Merged

6 tasks

coderabbitai Bot mentioned this pull request Jan 26, 2026

Bug: Safety circuit breaker triggers on every loop with JSON output mode, causing premature exit after 5 loops #128

Closed

This was referenced Jan 29, 2026

Circuit breaker false positive: "no progress" when Claude commits within the loop #141

Closed

Bug: Progress detection counts date entries as uncompleted checkboxes #144

Closed

coderabbitai Bot mentioned this pull request Feb 18, 2026

[BUG] Timeout (exit code 124) misidentified as API 5-hour usage limit #183

Closed

traycerai Bot mentioned this pull request Feb 20, 2026

[BUG] Loop persistence misidentified as active session despite "completed" status in progress.json #190

Closed

coderabbitai Bot mentioned this pull request Mar 15, 2026

fix(loop): detect is_error:true in JSON output despite exit code 0 #215

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix circuit breaker false positives from JSON field names#6

Fix circuit breaker false positives from JSON field names#6
frankbria merged 4 commits into
mainfrom
fix/circuit-breaker-false-positives

frankbria commented Dec 31, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Dec 31, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

frankbria commented Dec 31, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Changes

1. ralph_loop.sh (lines 390-409)

2. lib/response_analyzer.sh (lines 91-97)

3. tests/test_error_detection.sh (new file)

Test Results

Test Scenarios Covered

Impact

Validation

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

frankbria commented Dec 31, 2025 •

edited by coderabbitai Bot

Loading

1. `ralph_loop.sh` (lines 390-409)

2. `lib/response_analyzer.sh` (lines 91-97)

3. `tests/test_error_detection.sh` (new file)

coderabbitai Bot commented Dec 31, 2025 •

edited

Loading