feat(cli): upgrade squad watch to full work monitor with --execute mode (#708)#709
feat(cli): upgrade squad watch to full work monitor with --execute mode (#708)#709tamirdresher merged 15 commits intodevfrom
Conversation
…de (#708) Add execute mode to the watch/triage loop so Ralph can autonomously spawn Copilot sessions to work on eligible issues. New CLI flags: --execute enable work-execution mode (default: triage only) --copilot-flags extra flags forwarded to copilot CLI --agent-cmd hidden override for the full agent command --max-concurrent parallel issue limit per round (default: 1) --timeout max minutes per issue execution (default: 30) New functions in watch.ts: buildAgentCommand — construct cmd+args for the agent subprocess executeIssue — claim issue, comment, spawn agent with timeout findExecutableIssues — filter board for ready-to-work issues selfPull — best-effort git fetch+pull between rounds Updated BoardState and reportBoard to include executed count. When --execute is NOT passed, behaviour is 100% identical to before. Part of #708 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds an “execute mode” to squad watch so Ralph can claim and run an agent subprocess against eligible issues, turning the existing triage loop into a full work monitor.
Changes:
- Replace
runWatch(dest, intervalMinutes)withrunWatch(dest, WatchOptions)and add execute-mode config (--execute,--copilot-flags,--max-concurrent,--timeout, hidden--agent-cmd). - Implement execute-mode helpers in
watch.ts(agent command building, executable-issue filtering, per-issue execution, best-effortgit fetch/pull). - Extend board reporting with an
executedcounter for execute-mode rounds.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| packages/squad-cli/src/cli/commands/watch.ts | Adds WatchOptions, execute-mode helpers (executeIssue, buildAgentCommand, findExecutableIssues, selfPull), and reports per-round executed count. |
| packages/squad-cli/src/cli-entry.ts | Parses new CLI flags for triage/watch and passes them into runWatch via WatchOptions. |
| changesRequested: number; | ||
| ciFailures: number; | ||
| readyToMerge: number; | ||
| executed: number; | ||
| } |
There was a problem hiding this comment.
BoardState now requires an executed field, which will break existing TypeScript call sites that construct a board-state object literal (e.g. test/cli/watch.test.ts passes objects without executed). Either update those callers/tests to include executed: 0 or make executed optional with a default of 0 inside reportBoard/emptyBoardState to keep the type change non-breaking.
| const prompt = `Work on issue #${issue.number}: ${issue.title}. Read the issue body for full details.`; | ||
|
|
||
| if (options.agentCmd) { | ||
| const parts = options.agentCmd.trim().split(/\s+/); |
There was a problem hiding this comment.
buildAgentCommand treats any truthy options.agentCmd as valid, but a whitespace-only value (e.g. ' ') becomes an empty cmd after trim() and will cause execFile to fail with ENOENT. Consider validating agentCmd.trim().length > 0 (or throwing a clear error) before splitting/using it.
| const parts = options.agentCmd.trim().split(/\s+/); | |
| const raw = options.agentCmd.trim(); | |
| if (raw.length === 0) { | |
| fatal('Invalid --agent-cmd: command is empty after trimming whitespace.'); | |
| } | |
| const parts = raw.split(/\s+/); |
| export function findExecutableIssues( | ||
| roster: ReturnType<typeof parseRoster>, | ||
| capabilities: MachineCapabilities | null, | ||
| issues: GhIssue[], | ||
| ): GhIssue[] { |
There was a problem hiding this comment.
findExecutableIssues receives capabilities but never uses it, so execute mode may pick up issues that runCheck would skip due to missing needs:* capabilities. Consider applying the same filterByCapabilities() gating here (or remove the unused param and pass only already-filtered issues) to avoid executing non-actionable work.
| const maxConcurrentIdx = args.indexOf('--max-concurrent'); | ||
| const maxConcurrent = (maxConcurrentIdx !== -1 && args[maxConcurrentIdx + 1]) | ||
| ? parseInt(args[maxConcurrentIdx + 1]!, 10) | ||
| : 1; |
There was a problem hiding this comment.
--max-concurrent is parsed with parseInt but never validated. If the user passes a non-number/0/negative, slice(0, maxConcurrent) can silently execute zero issues or behave unexpectedly. Consider validating it is a finite integer >= 1 and failing fast with a clear usage error.
| : 1; | |
| : 1; | |
| if (!Number.isFinite(maxConcurrent) || !Number.isInteger(maxConcurrent) || maxConcurrent < 1) { | |
| console.error( | |
| `Error: --max-concurrent must be a positive integer (>= 1). Received: ${args[maxConcurrentIdx + 1] ?? maxConcurrent}`, | |
| ); | |
| process.exit(1); | |
| } |
| const issueTimeoutMinutes = (timeoutIdx !== -1 && args[timeoutIdx + 1]) | ||
| ? parseInt(args[timeoutIdx + 1]!, 10) | ||
| : 30; | ||
|
|
There was a problem hiding this comment.
--timeout is parsed with parseInt but never validated. If it becomes NaN or <= 0, timeoutMs in executeIssue becomes NaN/non-sensical and the child-process timeout behavior will be unreliable. Consider validating it is a finite integer >= 1 and reporting a usage error.
| const issueTimeoutMinutes = (timeoutIdx !== -1 && args[timeoutIdx + 1]) | |
| ? parseInt(args[timeoutIdx + 1]!, 10) | |
| : 30; | |
| let issueTimeoutMinutes = (timeoutIdx !== -1 && args[timeoutIdx + 1]) | |
| ? parseInt(args[timeoutIdx + 1]!, 10) | |
| : 30; | |
| if (!Number.isFinite(issueTimeoutMinutes) || !Number.isInteger(issueTimeoutMinutes) || issueTimeoutMinutes <= 0) { | |
| console.error('Error: --timeout must be a positive integer number of minutes.'); | |
| process.exit(1); | |
| } |
| export function buildAgentCommand( | ||
| issue: GhIssue, | ||
| teamRoot: string, | ||
| options: WatchOptions, | ||
| ): { cmd: string; args: string[] } { |
There was a problem hiding this comment.
New pure helpers (buildAgentCommand, findExecutableIssues) introduce non-trivial parsing/filtering logic, but the existing watch tests only cover reportBoard. Adding unit tests for these helpers would help prevent regressions (e.g., handling copilotFlags splitting, blocked labels, assignee filtering, and custom agentCmd parsing).
…ave dispatch, retro, hygiene (#708) Add 9 new opt-in flags to squad watch / triage command: - --monitor-teams: scan Teams via WorkIQ for actionable messages - --monitor-email: scan email for CI failures, Dependabot, security alerts - --board / --board-project N: project board lifecycle (reconcile, archive) - --two-pass: lightweight list then hydrate actionable issues only - --wave-dispatch: dependency-aware parallel sub-task execution - --retro: enforce retrospective checks (Fridays or >7 days) - --decision-hygiene: auto-merge decision inbox when >5 files - --channel-routing: route notifications to Teams channels All features disabled by default — existing behavior unchanged. Includes subsquad discovery, buildAgentCommandFromPrompt helper, and spawnWithTimeout utility. Every new function has try/catch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- CHANGELOG entry for all new watch flags - Ralph feature docs rewritten with full watch mode guide - CLI reference updated with all new flags - Unit tests for new watch functions - Blog post: 'From Triage Bot to Work Monitor' Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix TS2339: cast execFile error to `Error & { killed?: boolean }` instead
of NodeJS.ErrnoException (which lacks .killed property)
- Fix cspell: replace "vulns" with "vulnerabilities" in blog post
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… wrapper (#708) EECOM blocking issues resolved: 1. updateBoard: replaced $(gh repo view ...) shell substitution with pre-resolved execFileSync call — execFile doesn't expand shell syntax 2. twoPassScan: replaced raw execFileAsync('gh', ...) with existing ghIssueList wrapper for consistent error handling Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New watch flags (--execute, --board, --monitor-teams, etc.) added help text lines, bumping output from 110 to 113 lines. Increase threshold from 110 to 125 to accommodate current and future flag additions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tests were importing from @bradygaster/squad-cli/commands/watch which requires a built dist/. Changed to source path import so tests work without a full build. All 30 watch tests pass (6 original + 8 new + 16 circuit-breaker). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rt (#708) Refactor watch.ts to use a WatchPlatform interface that abstracts platform-specific operations (list work items, edit, PRs, comments). - Add WatchPlatform interface with GitHubWatchPlatform and AdoWatchPlatform - GitHubWatchPlatform wraps existing gh-cli calls (zero behavior change) - AdoWatchPlatform wraps AzureDevOpsAdapter from squad-sdk (az CLI) - Add createWatchPlatform factory with auto-detection from git remote - Add --platform github|ado flag to cli-entry.ts watch handler - Replace all direct gh-cli calls in watch.ts with platform calls - Add tests for createWatchPlatform factory and platform options - Update CLI reference and Ralph docs with --platform flag and ADO section - Update CHANGELOG with ADO platform support entry All existing tests pass — backward compatible (GitHub is the default). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the custom WatchPlatform interface, GitHubWatchPlatform and AdoWatchPlatform classes, and the createWatchPlatform factory in favor of the SDK's createPlatformAdapter(). Add thin mapping helpers (toWatchWorkItem, toWatchPullRequest, listWatchWorkItems, listWatchPullRequests, editWorkItem) to bridge SDK types to the internal WatchWorkItem/WatchPullRequest interfaces. - Remove --platform CLI flag from cli-entry.ts and WatchOptions - Replace platform.checkAvailable/checkAuthenticated with direct CLI checks based on adapter.type - Scope rate-limit circuit breaker to GitHub only (adapter.type) - Update CHANGELOG, CLI reference docs, and Ralph feature docs - Remove createWatchPlatform and WatchPlatform test blocks Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The removal of createWatchPlatform and WatchPlatform test blocks left the outer describe block unclosed, causing a parse error. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace monolithic watch.ts (1562 lines) with a plugin architecture
where each opt-in feature is a self-contained WatchCapability.
Architecture:
- watch/types.ts: WatchCapability interface, WatchContext, WatchPhase
- watch/registry.ts: CapabilityRegistry (register, getByPhase, all)
- watch/config.ts: loadWatchConfig merges .squad/config.json + CLI flags
- watch/capabilities/: 9 capability plugins extracted from watch.ts
Capabilities extracted:
- self-pull: git fetch/pull at round start (pre-scan phase)
- execute: spawn Copilot sessions for eligible issues (post-execute)
- board: project board lifecycle + reconciliation (post-execute)
- monitor-teams: Teams scanning via WorkIQ (housekeeping)
- monitor-email: email scanning + GitHub alerts (housekeeping)
- two-pass: lightweight scan then hydrate actionable (post-triage)
- wave-dispatch: parallel sub-task execution (post-execute)
- retro: retrospective enforcement (housekeeping)
- decision-hygiene: merge inbox cleanup (housekeeping)
Config system:
- .squad/config.json 'watch' section for persistent config
- Priority: CLI flag > config.json > default (off)
- --no-{capability} disables even if config enables it
- Auto-generates --help from registered capability descriptions
Backward compatibility:
- Legacy WatchOptions type still accepted by runWatch
- 'squad watch' with no flags = triage only (unchanged)
- 'squad watch --execute' still works as CLI flag
- All existing exports preserved via re-exports
The main loop is now a thin 4-phase orchestrator:
1. pre-scan → 2. core triage → 3. post-execute → 4. housekeeping
Each capability has try/catch in execute() — failures are warnings,
never crash the round. Capabilities are independently testable.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
watch.ts moved to watch/index.ts — update test import path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
watch.ts → watch/index.ts moved the export. Update package.json exports map to point to dist/cli/commands/watch/index.js. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ralph-board.test.ts and persistent-ralph.test.ts still imported from the old watch.js path. Updated to watch/index.js. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The cli-command-wiring test only scanned for top-level .ts files in commands/, but the watch refactor moved watch.ts into watch/index.ts. Now the test also discovers subdirectory commands with index.ts and includes them in the existingFiles set. Fixes test failure: watch/index.ts not found in commands/ Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🔄 Ralph PR status
Large feature PR — upgrades squad watch to full work monitor with --execute mode. Ready for review but scope is significant (3,825 additions). Consider squashing before merge. |
Add 51 tests covering 5 watch capabilities (execute, cleanup, decision-hygiene, self-pull, board) that shipped with zero test coverage in PR #709. Test coverage: - ExecuteCapability: buildAgentPrompt, findExecutableIssues edge cases, preflight, execute flow (adapter mock, agent dispatch, errors, timeouts) - CleanupCapability: preflight, round-skipping, file pruning by date, stale inbox warnings, config validation - DecisionHygieneCapability: preflight, threshold logic, merge trigger, timeout handling - SelfPullCapability: preflight, git stash/fetch/pull flow, stash pop failure, source change detection - BoardCapability: preflight checks Closes #709 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 51 tests covering 5 watch capabilities (execute, cleanup, decision-hygiene, self-pull, board) that shipped with zero test coverage in PR #709. Test coverage: - ExecuteCapability: buildAgentPrompt, findExecutableIssues edge cases, preflight, execute flow (adapter mock, agent dispatch, errors, timeouts) - CleanupCapability: preflight, round-skipping, file pruning by date, stale inbox warnings, config validation - DecisionHygieneCapability: preflight, threshold logic, merge trigger, timeout handling - SelfPullCapability: preflight, git stash/fetch/pull flow, stash pop failure, source change detection - BoardCapability: preflight checks Closes #709 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 51 tests covering 5 watch capabilities (execute, cleanup, decision-hygiene, self-pull, board) that shipped with zero test coverage in PR #709. Test coverage: - ExecuteCapability: buildAgentPrompt, findExecutableIssues edge cases, preflight, execute flow (adapter mock, agent dispatch, errors, timeouts) - CleanupCapability: preflight, round-skipping, file pruning by date, stale inbox warnings, config validation - DecisionHygieneCapability: preflight, threshold logic, merge trigger, timeout handling - SelfPullCapability: preflight, git stash/fetch/pull flow, stash pop failure, source change detection - BoardCapability: preflight checks Closes #709 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 51 tests covering 5 watch capabilities (execute, cleanup, decision-hygiene, self-pull, board) that shipped with zero test coverage in PR #709. Test coverage: - ExecuteCapability: buildAgentPrompt, findExecutableIssues edge cases, preflight, execute flow (adapter mock, agent dispatch, errors, timeouts) - CleanupCapability: preflight, round-skipping, file pruning by date, stale inbox warnings, config validation - DecisionHygieneCapability: preflight, threshold logic, merge trigger, timeout handling - SelfPullCapability: preflight, git stash/fetch/pull flow, stash pop failure, source change detection - BoardCapability: preflight checks Closes #709 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 51 tests covering 5 watch capabilities (execute, cleanup, decision-hygiene, self-pull, board) that shipped with zero test coverage in PR #709. Test coverage: - ExecuteCapability: buildAgentPrompt, findExecutableIssues edge cases, preflight, execute flow (adapter mock, agent dispatch, errors, timeouts) - CleanupCapability: preflight, round-skipping, file pruning by date, stale inbox warnings, config validation - DecisionHygieneCapability: preflight, threshold logic, merge trigger, timeout handling - SelfPullCapability: preflight, git stash/fetch/pull flow, stash pop failure, source change detection - BoardCapability: preflight checks Closes #709 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
What
Upgrade
squad watch(Ralph's polling loop) from triage-only to a full work monitor that can autonomously execute issues via--executemode.Why
Today
squad watchonly triages (labels issues + reports PR status). With--execute, Ralph can spawn Copilot sessions to actually work on eligible issues — closing the loop from triage to execution without human intervention.Closes #708
How
cli-entry.ts — Parse five new flags (
--execute,--copilot-flags,--agent-cmd(hidden),--max-concurrent,--timeout) and pass aWatchOptionsobject torunWatch.watch.ts — Core changes:
WatchOptionsinterface — typed options bag replacing the bareintervalMinutesparameterbuildAgentCommand()— constructs{cmd, args}for the agent subprocess (default:gh copilot --message ...; hidden--agent-cmdoverride for custom agents)executeIssue()— claims issue (@me), posts comment, spawns agent viaexecFilewith 50 MB maxBuffer and configurable timeoutfindExecutableIssues()— filters board for issues that are labelled for a squad member, unassigned, and not blocked (filters 9 blocking labels includingstatus:blocked,pending-user,do-not-merge)selfPull()— best-effortgit fetch + pull --ff-onlybetween rounds (never blocks)executeRound()— when execute mode is on: selfPull first, then after triage find & execute eligible issues up tomaxConcurrentBoardState— addedexecutedcount;reportBoardshows 🚀 line(Execute)mode indicator, copilot flags, and concurrency settingsTesting
--executeis NOT passed, behaviour is 100% identical to current (no new code paths execute)options.executeguardsquad watch --execute --copilot-flags "--yolo" --max-concurrent 2 --timeout 15Docs
Help text updated in cli-entry.ts for triage command. Hidden
--agent-cmdflag intentionally omitted from help.Exports
No new public exports from
package.json.WatchOptionsand helper functions are exported from the module for testability but not from the package entry point.Breaking Changes
None. The
runWatchsignature changed from(dest, intervalMinutes)to(dest, WatchOptions)but the only call site (cli-entry.ts) is updated in this PR.Waivers