From 050898d6a0e369d1d8a9d35ba46e79cb6b9518df Mon Sep 17 00:00:00 2001 From: zbigniew sobiecki Date: Fri, 24 Apr 2026 11:27:03 +0200 Subject: [PATCH 1/4] chore: spec 013 + plan for subprocess output streaming MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Captures the /specify + /plan artifacts for the observable subprocess helper that fixes cascade-tools' silent hang during slow pre-push hooks (seen in ucho MNG-287 and MNG-290). Single-plan decomposition — implementation lands in the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../1-observable-subprocess-helper.md | 298 ++++++++++++++++++ .../_coverage.md | 32 ++ docs/specs/013-subprocess-output-streaming.md | 135 ++++++++ 3 files changed, 465 insertions(+) create mode 100644 docs/plans/013-subprocess-output-streaming/1-observable-subprocess-helper.md create mode 100644 docs/plans/013-subprocess-output-streaming/_coverage.md create mode 100644 docs/specs/013-subprocess-output-streaming.md diff --git a/docs/plans/013-subprocess-output-streaming/1-observable-subprocess-helper.md b/docs/plans/013-subprocess-output-streaming/1-observable-subprocess-helper.md new file mode 100644 index 00000000..8b169f87 --- /dev/null +++ b/docs/plans/013-subprocess-output-streaming/1-observable-subprocess-helper.md @@ -0,0 +1,298 @@ +--- +id: 013 +slug: subprocess-output-streaming +plan: 1 +plan_slug: observable-subprocess-helper +level: plan +parent_spec: docs/specs/013-subprocess-output-streaming.md +depends_on: [] +status: pending +--- + +# 013/1: Observable subprocess helper + +> Part 1 of 1 in the 013-subprocess-output-streaming plan. See [parent spec](../../specs/013-subprocess-output-streaming.md). + +## Summary + +This plan is the full execution of spec 013. One cohesive change: replace the hand-rolled `spawn`-based `runCommand()` in `src/utils/repo.ts` with an `execa`-backed implementation that streams child output to the parent's stderr as it arrives, emits a heartbeat line every 30s during child silence, kills the child (and its descendants) on either an idle-silence timeout or a wall-clock timeout via a SIGTERM→SIGKILL ladder, and preserves captured stdout/stderr in the returned result on both success and failure. A second, trivially-adjacent change updates `bin/cascade-tools.js` to exclude `bootstrap.js` from the oclif command-loader glob, silencing the `command bootstrap not found` warning. + +**Value ships:** agents watching cascade-tools output stop perceiving long `git push`/hook runs as hangs; cascade-tools emits live progress and enforces clean termination for actually-stuck children; operator log output is cleaner. All nine spec ACs land here (AC 9 is `[manual]`). + +**What this plan does NOT change:** the `runCommand()` exported signature's existing fields (`{ stdout, stderr, exitCode }`) — they continue to appear in the result object (we only add an optional `reason` field for termination cause). No caller is required to pass the new `options` arg; defaults cover every existing callsite. Gadget-level 240s timeout and the dashboard/router/worker paths are untouched. + +**Components delivered:** +- `package.json` — add `execa` and `tree-kill` to `dependencies` +- `src/utils/repo.ts` — full rewrite of `runCommand()` on top of execa; new exported `RunCommandOptions` type; new optional `reason` field on result +- `src/gadgets/github/core/createPR.ts` — `pushBranch()` and `stageAndCommit()` pass explicit tighter timeouts (push waits up to the gadget's 240s ceiling); success path returns captured hook output instead of dropping it +- `bin/cascade-tools.js` — append `!bootstrap.js` to oclif `globPatterns` +- `tests/unit/utils/repo.test.ts` — retool existing mocks (spawn → execa), add tests for streaming, heartbeat, idle timeout, wall timeout, tree-kill, kept-output-on-success +- `tests/unit/gadgets/github/core/createPR.test.ts` — add assertions that pushBranch/stageAndCommit pass explicit timeout options and that success-path result exposes hook output + +**Deferred to later plans in this spec:** +- None — 1-plan spec. + +--- + +## Spec ACs satisfied by this plan + +- Spec AC #1 (live stderr streaming during hook runs) — **full** +- Spec AC #2 (30s heartbeat on silence) — **full** +- Spec AC #3 (idle-timeout kill with SIGTERM→SIGKILL) — **full** +- Spec AC #4 (wall-clock kill with SIGTERM→SIGKILL) — **full** +- Spec AC #5 (descendant / process-group kill) — **full** +- Spec AC #6 (captured output preserved on success + failure) — **full** +- Spec AC #7 (oclif `command bootstrap not found` warning gone) — **full** +- Spec AC #8 (backward-compat result shape for all existing callers) — **full** +- Spec AC #9 (agent end-to-end sees hook progress ≤30s, no retry loop) — **full `[manual]`** — see Manual Verification section + +--- + +## Depends On + +None. Plan depends only on cascade's existing test + build toolchain and a working network for `npm install`. + +--- + +## Detailed Task List (TDD) + +### 1. Add `execa` and `tree-kill` dependencies + +**No tests for this step directly** — dependency presence is verified indirectly by the helper tests below. + +**Implementation** (`package.json`): +- Add `"execa": "^9.6.1"` to `dependencies` +- Add `"tree-kill": "^1.2.2"` to `dependencies` +- Add `"@types/tree-kill": "^1.2.2"` to `devDependencies` (tree-kill ships CJS; types live under `@types`) +- Run `npm install` and commit the updated `package-lock.json` + +--- + +### 2. Rewrite `runCommand()` on execa + +**Tests first** (`tests/unit/utils/repo.test.ts`): + +Retool the existing `vi.mock('node:child_process', ...)` block: keep it for residual callsites but remove the `spawn` mock and switch the `runCommand` tests to mock `execa` and `tree-kill` directly. Use `vi.mock('execa', ...)` returning a fake subprocess whose `stdout` / `stderr` are `Readable` streams the test can push to, with a `pid` number and a `then(...)` resolver representing the await of the subprocess. Use `vi.mock('tree-kill', ...)` returning a mock function. + +For each test specify name, type, setup, expected outcome, AND expected red. + +- `streams child stdout to parent stderr line-by-line as it arrives` — unit — mock execa subprocess pushes `"line1\n"`, then `"line2\n"` to `stdout`; spy on `process.stderr.write`; await helper call → both lines appear on the spy in order, BEFORE the subprocess resolves. Expected red: `AssertionError: expected process.stderr.write to have been called with "line1\n" but it was called [] (0 times)`. +- `streams child stderr to parent stderr line-by-line` — unit — same shape as above but push to child `stderr` → parent stderr receives. Expected red: `AssertionError: expected process.stderr.write to have been called with "err1\n" but it was called [] (0 times)`. +- `emits a heartbeat to parent stderr after N ms of child silence, citing elapsed time and command label` — unit — `vi.useFakeTimers()`; call helper with `heartbeatMs: 1000` and a label; subprocess emits no data; advance timers by 1000ms → a line matching `/\[git-push\] still running \(1s\)/` appears on parent stderr. Expected red: `AssertionError: expected process.stderr.write to have been called with match(/\[git-push\] still running/) but it was not`. +- `resets the heartbeat timer when child emits output` — unit — fake timers; `heartbeatMs: 1000`; advance 900ms (no heartbeat yet); subprocess pushes `"tick\n"`; advance 1000ms more → only one heartbeat fires total (at 1900ms cumulative), not two. Expected red: `AssertionError: expected 1 heartbeat, got 2 (heartbeat at 1000ms was not cancelled by child output at 900ms)`. +- `does not emit heartbeat when child exits before heartbeatMs elapses` — unit — `heartbeatMs: 10_000`; subprocess resolves after 100ms → zero heartbeat lines. Expected red: `AssertionError: expected 0 heartbeats, got 1`. +- `kills the child via tree-kill with SIGTERM when idleTimeoutMs elapses with no output` — unit — fake timers; `idleTimeoutMs: 5000`; no child output; advance timers by 5000ms → tree-kill mock called with `(pid, 'SIGTERM')`; result `reason` is `'idle-timeout'`; `exitCode` is non-zero. Expected red: `AssertionError: expected tree-kill to have been called with ["", "SIGTERM"] but was not called`. +- `escalates to SIGKILL after forceKillAfterMs if the child did not exit on SIGTERM` — unit — fake timers; idle-timeout fires SIGTERM; advance another 5000ms without the child exiting → tree-kill called a second time with `(pid, 'SIGKILL')`. Expected red: `AssertionError: expected tree-kill to be called 2 times (SIGTERM then SIGKILL), was called 1 time`. +- `kills the child via tree-kill with SIGTERM when wallTimeoutMs elapses even with ongoing output` — unit — fake timers; `wallTimeoutMs: 5000`, `idleTimeoutMs: 100_000`; subprocess pushes data every 500ms (resets idle timer); advance 5000ms → tree-kill called with SIGTERM; result `reason` is `'wall-timeout'`. Expected red: `AssertionError: expected reason: "wall-timeout", got undefined (wall-clock timer not armed or not firing under fake-timer advance)`. +- `returns captured stdout and stderr in the result on success` — unit — subprocess pushes `"ok\n"` on stdout then resolves with exit 0 → result is `{ stdout: "ok\n", stderr: "", exitCode: 0 }`. Expected red: `AssertionError: expected result.stdout to equal "ok\n", got "" (capture discarded)`. +- `returns captured stdout and stderr in the result on non-zero exit` — unit — subprocess pushes `"failed\n"` on stderr then resolves with exit 1 → result exposes stderr contents AND exitCode 1. Expected red: `AssertionError: expected result.exitCode to equal 1, got 0` (or capture-discard variant). +- `does not stream when options.silent is true` — unit — subprocess pushes data; spy on `process.stderr.write` → zero forwarded lines (heartbeat-suppression is a separate option but silent also suppresses streaming). Expected red: `AssertionError: expected process.stderr.write 0 calls, got 1`. +- `backward-compatible signature: runCommand(cmd, args, cwd) returns { stdout, stderr, exitCode }` — unit — no options arg; mock subprocess; result has exactly the three-field shape plus optional `reason` undefined. Expected red: `TypeError: options is not defined` or shape-mismatch assertion. + +**Implementation** (`src/utils/repo.ts`): + +Replace the current `runCommand()` body (lines 70–103) with an execa-based implementation. Keep the existing exported signature; extend with a fifth optional `options` arg. + +- New exported type: + +```ts +export type RunCommandOptions = { + /** Emit a heartbeat on parent stderr every N ms of child silence. Default 30_000. Set to 0 to disable. */ + heartbeatMs?: number; + /** Kill child if no output for N ms. Default 120_000. Set to 0 to disable. */ + idleTimeoutMs?: number; + /** Kill child after N ms of total runtime. Default 600_000. Set to 0 to disable. */ + wallTimeoutMs?: number; + /** After SIGTERM, wait N ms before SIGKILL. Default 5_000. */ + forceKillAfterMs?: number; + /** Short label emitted in heartbeat lines. Defaults to `command`. */ + label?: string; + /** Suppress streaming and heartbeats. Capture-only. Default false. */ + silent?: boolean; +}; + +export type RunCommandResult = { + stdout: string; + stderr: string; + exitCode: number; + /** Set when the child was killed by the helper's timeouts. Undefined on natural exit. */ + reason?: 'idle-timeout' | 'wall-timeout'; +}; +``` + +- Signature: + +```ts +export async function runCommand( + command: string, + args: string[], + cwd: string, + env?: Record, + options?: RunCommandOptions, +): Promise +``` + +- Internals: + - Spawn via `execa(command, args, { cwd, env: { ...process.env, ...env }, all: false, reject: false })`. `reject: false` makes execa return a `SubprocessResult` even on non-zero exits, matching the current contract. + - If `!silent`, attach listeners on `subprocess.stdout` and `subprocess.stderr` that (a) append chunks to capture buffers AND (b) write them to `process.stderr` as they arrive, and (c) reset the idle-timer. + - If `silent`, capture only; don't stream; don't heartbeat. + - Heartbeat: `setInterval` with `heartbeatMs` (default 30_000). Each tick checks "was there child output since the last tick?"; if not, write `[