feat(circuit-breaker): add auto-recovery from OPEN state#165
Conversation
The OPEN state was terminal — once triggered, it persisted across restarts with no automatic recovery. This adds two recovery mechanisms: 1. Cooldown timer (default): OPEN → HALF_OPEN after CB_COOLDOWN_MINUTES (default 30). The existing HALF_OPEN logic handles recovery or re-trip. 2. Auto-reset option: CB_AUTO_RESET=true bypasses cooldown, resets to CLOSED on startup for fully unattended operation. Changes: - Add parse_iso_to_epoch() to lib/date_utils.sh (cross-platform) - Add cooldown + auto-reset logic to init_circuit_breaker() - Add opened_at field to state file when entering/staying OPEN - Add --auto-reset-circuit CLI flag and .ralphrc config vars - Add 19 tests in test_circuit_breaker_recovery.bats - Update CLAUDE.md and README.md documentation
WalkthroughThis PR adds automatic circuit-breaker recovery (cooldown-based OPEN→HALF_OPEN and optional startup auto-reset to CLOSED), persists an Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI/ralph_loop.sh
participant CBInit as CircuitBreaker (lib/circuit_breaker.sh)
participant StateFile as State File
participant DateUtil as Date Utils (parse_iso_to_epoch)
CLI->>CBInit: init_circuit_breaker()
CBInit->>StateFile: read persisted state JSON
StateFile-->>CBInit: { state: OPEN, opened_at: "ISO_ts" }
alt CB_AUTO_RESET = true
CBInit->>CBInit: set state = CLOSED
CBInit->>StateFile: persist CLOSED state
else CB_AUTO_RESET = false
CBInit->>DateUtil: parse_iso_to_epoch("ISO_ts")
DateUtil-->>CBInit: opened_at_epoch
CBInit->>CBInit: compute elapsed_minutes
alt elapsed_minutes >= CB_COOLDOWN_MINUTES
CBInit->>CBInit: transition OPEN → HALF_OPEN
CBInit->>StateFile: persist HALF_OPEN (with opened_at)
else
CBInit->>StateFile: retain OPEN (with opened_at)
end
end
CBInit-->>CLI: initialization complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
Add auto-recovery for the circuit breaker by updating
|
Code Review: feat(circuit-breaker): add auto-recovery from OPEN stateSummaryThis is a well-implemented feature that adds two complementary auto-recovery mechanisms to the circuit breaker system. The code is clean, well-tested (19 new tests, 484/484 passing), and maintains backward compatibility. Code Quality & Best Practices ✅Strengths:
Minor observations:
Potential Bugs or Issues ✅No bugs found. The implementation handles edge cases well:
Performance Considerations ✅No concerns:
Security Considerations ✅No security concerns:
Test Coverage ✅Excellent test coverage (19 tests):
Recommendations (Non-Blocking)
ConclusionLGTM with 1 minor documentation suggestion. The implementation is production-ready with excellent test coverage. The dual mechanism (cooldown + auto-reset) provides flexibility for both attended and unattended workflows. All 484 tests pass with zero regressions. |
|
Review posted to #165 (comment) |
Code Review - PR 165: Circuit Breaker Auto-RecoveryThis is an excellent implementation of auto-recovery from the OPEN state. The code is well-structured, thoroughly tested, and maintains backward compatibility. ✅ Strengths1. Robust Cross-Platform Date Parsing 2. Excellent Backward Compatibility
3. Comprehensive Test Coverage 4. Clock Skew Safety 🔍 Code Quality Observations1. JSON Generation (lib/circuit_breaker.sh:309-310) 2. Variable Shadowing in Tests 3. Error Handling 🎯 Security and PerformanceSecurity: No concerns. All inputs validated, no command injection risks. Performance: parse_iso_to_epoch() uses early returns, negligible overhead added. 📚 Documentation QualityExcellent coverage in CLAUDE.md and README.md with clear explanations, configuration examples, and CLI reference. Optional: add troubleshooting section in future. ✨ Test QualityExceptional test design with helpers, edge case coverage, and isolated CLI flag testing. 🚀 RecommendationAPPROVE with confidence. This PR:
Minor observations are stylistic suggestions, not blockers. Ready to merge. 🎉 |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@CLAUDE.md`:
- Around line 447-452: The README/CLAUDE.md test count is inconsistent: update
the earlier summary line that currently reads "Run all tests (420 tests)" to
match the new total "Run all tests (484 tests)" so the doc is consistent with
the updated table under the "Test Files (484 tests)" heading; locate and replace
the exact phrase "Run all tests (420 tests)" in CLAUDE.md with "Run all tests
(484 tests)" and verify no other stray 420 references remain.
In `@lib/circuit_breaker.sh`:
- Around line 60-113: The transition logging is called before the history file
is guaranteed to exist which can create/overwrite an invalid history and break
callers; fix by ensuring the history file is validated/created before any call
to log_circuit_transition (or add defensive validation inside
log_circuit_transition itself). Concretely, add a preflight that
creates/initializes CB_HISTORY_FILE (or call an existing
init_history/ensure_history function) before the two shown calls to
log_circuit_transition in the Auto-reset and Cooldown branches (and the similar
calls around lines 115-125), or modify log_circuit_transition to first
validate/create CB_HISTORY_FILE and return early on I/O errors.
In `@lib/date_utils.sh`:
- Around line 68-75: The BSD branch currently strips timezone info (variable
stripped) and parses with date -j -f which treats the timestamp as local time;
instead preserve/normalize the timezone and parse it so UTC is respected: modify
the logic that sets stripped to convert a trailing "Z" to "+0000" (or leave
existing +HH:MM/+-HH:MM as-is) and call date -j -f with a format that includes
the zone, e.g. "%Y-%m-%dT%H:%M:%S%z" when invoking date -j -f (the code path
using date -j -f in this file), so the parsed epoch uses the correct UTC offset
rather than local time.
In `@README.md`:
- Around line 652-654: Update the inconsistent test count references in README:
find the "Test Coverage" header and any test-count badge or occurrences showing
"465" (and the related badge text/alt/title) and change them to "484" so they
match the updated footer lines that show "484 tests" and "100% pass rate"; also
update the other occurrence noted (around line 855) to the same "484" and ensure
the badge markdown (alt/title/text) and any accompanying header text reflect
"484 tests" and "100% pass rate".
- Move history file init before auto-recovery logic to prevent log_circuit_transition from writing to nonexistent history file - Fix BSD date -j timezone handling: normalize tz (Z→+0000, ±HH:MM→±HHMM) and parse with %z format so UTC timestamps aren't misinterpreted as local - Update stale test counts in CLAUDE.md (420→484) and README.md (465→484) across badge, header, and inline comments
Code ReviewThis is a well-implemented feature with excellent test coverage. The dual recovery mechanism (cooldown + auto-reset) provides flexibility for different use cases. Code QualityStrengths:
Issues AssessmentI've reviewed the CodeRabbit comments and found:
Security & Performance
Minor Observations
ConclusionLGTM - Approve for merge. The implementation is production-ready with comprehensive testing. All 484 tests pass with zero regressions. The CodeRabbit "major" issues appear to be false positives based on incorrect analysis of the code flow. Great work on Issue 160! |
|
Code review comment posted to #165 (comment) |
Code Review - PR 165: Circuit Breaker Auto-RecoveryThis is an exceptionally well-implemented feature that adds auto-recovery mechanisms to the circuit breaker. The implementation is thorough, well-tested, and maintains backward compatibility. All 484 tests passing with zero regressions demonstrates excellent quality control. ✅ Strengths1. Robust Cross-Platform Date Parsing
2. Excellent Backward Compatibility
3. Comprehensive Test Coverage
4. Clock Skew Safety 5. Clear Documentation |
🔍 Minor Observations (Non-Blocking)1. JSON Generation Pattern 2. Documentation Enhancement Suggested addition after line 50 in CLAUDE.md:
3. Theoretical Edge Case 🎯 Security, Performance & ArchitectureSecurity: ✅ No concerns
Performance: ✅ No concerns
Architecture: ✅ Excellent
📚 Test QualityExceptional test design:
|
🚀 RecommendationAPPROVE with minor documentation suggestion This PR:
The minor observations above are stylistic suggestions for future consideration, not blockers. This is production-ready code. Excellent work! 🎉 Reviewed by: Claude Sonnet 4.5 (AI Code Review) |


Summary
CB_COOLDOWN_MINUTES, then existing HALF_OPEN logic handles recovery (progress → CLOSED) or re-trip (no progress → OPEN)CB_AUTO_RESET=trueor--auto-reset-circuitflag bypasses cooldown, resets to CLOSED on startup for fully unattended operationparse_iso_to_epoch()cross-platform ISO-to-epoch converter inlib/date_utils.shopened_atfield to circuit breaker state file for cooldown tracking (backward compatible with old state files)Closes #160
Changes
lib/date_utils.shparse_iso_to_epoch()function (GNU → BSD → manual fallback)lib/circuit_breaker.shinit_circuit_breaker(),opened_atin state fileralph_loop.sh--auto-reset-circuitCLI flag, env var capture/restore for new configtemplates/ralphrc.templateCB_COOLDOWN_MINUTESandCB_AUTO_RESETconfig optionstests/unit/test_circuit_breaker_recovery.batsCLAUDE.mdREADME.mdTest plan
npm test)bats tests/unit/test_circuit_breaker_recovery.batsCB_AUTO_RESET=truein.ralphrc, force OPEN → resets to CLOSEDralph --auto-reset-circuitwith OPEN state → resets and runsSummary by CodeRabbit
New Features
Documentation
Tests