feat(watch): port ralph-watch.ps1 resilience features (#743)#744
feat(watch): port ralph-watch.ps1 resilience features (#743)#744tamirdresher merged 14 commits intobradygaster:insiderfrom
Conversation
Add concurrency blocks with cancel-in-progress to squad-ci, squad-heartbeat, squad-triage, squad-label-enforce, and squad-issue-assign workflows. Scope: .github/workflows/ only (squad repo CI). Template workflows for customer repos are a separate product concern. Test: 20 assertions covering all 5 workflows. Refs: diberry#122 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Audited all contributor-facing files on dev branch and identified 8 gaps in external contributor experience. Proposes 7 deliverables prioritized by maintainer time savings: P1: Issue templates, good-first-issue curation, .squad/ explainer P2: CODE_OF_CONDUCT.md, contributor FAQ, README contributing section P3: SECURITY.md typo fix Goal: contributors self-serve from docs instead of asking Brady/Tamir the same questions repeatedly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…-guide-proposal docs: contributor guide improvements proposal
…e3-item-3-1 devops(ci): add concurrency controls to 5 workflows (Phase 3 item A1)
Fixes off-by-N error in nap command's decision archival where newline separators between entries weren't counted in the byte budget, causing archives to exceed the target size. Closes bradygaster#123 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…-size-calc fix(nap): account for separator newlines in decision archival budget
Port battle-tested features from ralph-watch.ps1 into squad-cli watch. All features are org-agnostic, config-driven, and backward compatible. P0 Reliability: - ModelCircuitBreaker: model-level fallback with cooldown + state persistence - Rate limit detection from API headers, predictive circuit opening - Pre-round health check (auth, disk space, branch drift, CB validation) - Post-failure remediation with tiered self-healing (reset CB, re-auth, git pull) P1 Execution Quality: - Issue priority scoring (P0-P3 labels, age, staleness, size, bug bonus) - Machine capability checking (needs:* labels vs local probes) - Stale work reclaim (unassign issues idle >24h) - Budget check (max issues per round) P2 Observability: - Heartbeat file (.squad/ralph-heartbeat.json) written every round - Structured log (.squad/ralph-watch.log) with rotation - Per-repo lockfile with PID, stale detection - Webhook alerts on consecutive failures (--webhook-url, --alert-threshold) CLI flags: --webhook-url, --alert-threshold, --max-budget, --capabilities Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bradygaster#743) Ported from ralph-watch.ps1 New-RatePool / Read-RatePool / Write-RatePool / Update-RatePool budget coordination logic. - New rate-pool.ts: tracks API call budget per interval window with file-based advisory locking (atomic temp+rename writes, retry on contention). Multiple Ralph instances share .squad/ralph-rate-pool.json. - execute.ts: acquireSlot() gates each issue before agent spawn; releaseSlot() fires in finally block. Budget-exhausted issues are skipped with a log line, not failed. - config.ts: watch.ratePool.maxCallsPerInterval (default 50) and watch.ratePool.intervalSeconds (default 600) wired through the three-tier merge (defaults < file < CLI). - index.ts: re-exports RatePool and types from the capability barrel. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The stale package-lock.json resolved @bradygaster/squad-sdk from the npm registry instead of the workspace link, causing workspace-integrity and test (rollup native binary) CI failures. Delete and regenerate the lockfile so npm resolves squad-sdk via the workspace symlink. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bug Found During E2E Testing (Issue tamirdresher/tamresearch1#2034)Circuit Breaker: The constructor uses // Current (buggy):
this.cooldownMinutes = options.cooldownMinutes || 5;
// Fix:
this.cooldownMinutes = options.cooldownMinutes ?? 5;Same pattern likely affects other numeric config values that accept Impact: Users who explicitly set Severity: Low — edge case, but a correctness bug. One-line fix per affected field. Found by B'Elanna (Tamir's Squad) — 47/47 other resilience tests pass |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…inks
- Regenerated package-lock.json from clean state to fix two CI failures:
1. workspace-integrity: stale registry entry for @bradygaster/squad-sdk
resolved to npmjs.org instead of local workspace link
2. test: lockfile missing @rollup/rollup-linux-x64-gnu (only had win32
platform entries)
- Fresh npm install produces lockfile with all platform optional deps
and correct workspace symlinks.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…gaster#739) The watch/triage command in cli-entry.ts previously only parsed --interval, silently ignoring all other flags (--monitor-teams, --execute, --board, etc.). Our bradygaster#743 watch-parity work already added parsing for all registered capability flags and new resilience flags. This commit completes the fix by adding unknown flag detection: any --flag not in the known set now prints a warning instead of being silently dropped. Closes bradygaster#739 Refs bradygaster#743 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Status UpdateCI: 10/10 green ✅ All feedback from @diberry addressed:
Ready for review — @diberry @bradygaster please approve when ready and we will do an insider release. — Tamir's Squad 🤖 |
Closes #743
Summary
Ports battle-tested resilience, observability, and execution quality features from ralph-watch.ps1 into the squad-cli watch TypeScript command. All features are org-agnostic, config-driven, and opt-in — without new flags, behavior is identical to current.
New Modules (10 files in \capabilities/)
P0 — Reliability
P1 — Execution Quality
ankIssues()\ returns sorted list.
eeds:*\ labels to local machine. Auto-detect GPU, Docker, Playwright, etc.
P2 — Observability
New CLI Flags
Backward Compatibility