feat(subprocess): observable subprocess helper — streaming + heartbeat + dual timeouts (spec 013)#1177
Merged
zbigniewsobiecki merged 4 commits intodevfrom Apr 24, 2026
Conversation
Captures the /specify + /plan artifacts for the observable subprocess helper that fixes cascade-tools' silent hang during slow pre-push hooks (seen in ucho MNG-287 and MNG-290). Single-plan decomposition — implementation lands in the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the spawn-based runCommand() in src/utils/repo.ts with an
execa + tree-kill implementation that:
- forwards child stdout/stderr to parent stderr line-by-line as it arrives
- emits a [label] still running (Ns) heartbeat on stderr every 30s of
child silence (configurable)
- enforces idle-silence timeout (default 120s) and wall-clock timeout
(default 600s) with SIGTERM→SIGKILL escalation via tree-kill
- preserves captured stdout/stderr in the result on success AND failure
- keeps the { stdout, stderr, exitCode } contract for all existing
callers (new optional `reason` field for timeout-caused kills)
createPR gadget now passes tighter explicit timeouts on the two calls
that trigger user hooks: git push gets wallTimeoutMs=230_000 (just under
the gadget's 240s ceiling) and idleTimeoutMs=90_000; git commit gets
120_000/60_000. Both return captured hook output to the caller via new
optional CreatePRResult.pushOutput and CreatePRResult.commitOutput
fields; previously the successful-push branch discarded the output.
bin/cascade-tools.js globPatterns now excludes bootstrap.js, silencing
the oclif 'command bootstrap not found' warning that appeared on every
invocation.
8338/8338 unit tests pass; 0 lint errors; 0 type errors. Spec AC #9
(end-to-end agent verification) is tagged [manual] and deferred to a
live agent run post-merge — see plan's Manual Verification section.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CLAUDE.md untouched by this spec — nothing to audit.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
CASCADE agent runs on
uchoand elsewhere have been timing out silently whencascade-tools scm create-prtriggers slow pre-push hooks. Root cause traced in two failed runs (MNG-287 timed out at 29m59s, MNG-290 hung 7+ min on README PR):runCommand()insrc/utils/repo.tsusesspawn()withstdio:['pipe','pipe','pipe']and buffers all child output into in-memory strings, emitting nothing until the subprocess exits. When the target repo'spre-pushhook runspnpm typecheck+pnpm test:run(~60s), the LLM-driven agent sees an empty output file, assumes hang, and burns 5–10 minutes retrying. Bonus issue: on successful push, captured output was discarded — agents never saw what the hook did. And no per-subprocess timeout (gadget-level 240s doesn't kill the child).What ships
Replaces the hand-rolled
spawn()wrapper with anexeca+tree-killimplementation (industry default, ~140M dl/mo). New behavior:[label] still running (Ns)on parent stderr every 30s of child silence (configurable, re-armed on each child output chunk)tree-killso hook grandchildren (test runners, subshells) die toocommand bootstrap not foundwarning silenced — oclif command-loader glob now excludesbootstrap.js(it's a side-effect import, not a command)Backward-compatible: existing
{ stdout, stderr, exitCode }result shape preserved; new optionalreason: 'idle-timeout' | 'wall-timeout'surfaces when the helper killed the child.Callers of
runCommandinherit the new defaults for free. The two slow callsites increatePR(git push,git commit) pass explicit tighter timeouts (push: wall=230s idle=90s — just under the gadget's 240s ceiling; commit: wall=120s idle=60s) and now return captured hook output via new optionalCreatePRResult.pushOutputandCreatePRResult.commitOutputfields.Artifacts
docs/specs/013-subprocess-output-streaming.md.donedocs/plans/013-subprocess-output-streaming/1-observable-subprocess-helper.md.donedocs/plans/013-subprocess-output-streaming/_coverage.mdTests
16 new tests (12 in
tests/unit/utils/repo.test.ts, 4 intests/unit/gadgets/github/core/createPR.test.ts):wallTimeoutMs ≤ 230_000+ finiteidleTimeoutMs4 existing tests aligned to the new call-shape (
github.test.tsassertions on push/commit invocations).Total: 8338/8338 unit tests pass (up from 8333, +5 net; 11 buffered-spawn tests retired, 16 new).
Verification
npm run buildnpm run typechecknpm run lintnpm testPre-existing warnings / errors found: none. Lint + typecheck were already clean pre-change.
Spec AC #9 —
[manual], deferredSpec AC #9 ("a CASCADE agent sees hook progress within ~30s and does not retry") requires a live agent run. Cannot be exercised inside the unit/integration suite. The verification protocol lives in the plan's Manual Verification section — run it after this lands and a worker picks up the updated image.
Scope
All changes in
src/utils/repo.ts,src/gadgets/github/core/createPR.ts,bin/cascade-tools.js, the two test files, + docs (README.md,docs/cascade-directory.md,CHANGELOG.md). No schema changes, no transport changes, no UI. Dashboard / router / worker untouched.🤖 Generated with Claude Code