Skip to content

chore(createPR): remove subprocess + gadget timeouts so long pre-push hooks survive#1185

Merged
zbigniewsobiecki merged 1 commit intodevfrom
chore/remove-createpr-timeouts
Apr 24, 2026
Merged

chore(createPR): remove subprocess + gadget timeouts so long pre-push hooks survive#1185
zbigniewsobiecki merged 1 commit intodevfrom
chore/remove-createpr-timeouts

Conversation

@zbigniewsobiecki
Copy link
Copy Markdown
Member

Summary

  • Drop the three stacked timeouts on CreatePR so pre-push hooks running full test suites (5+ min) stop getting SIGTERM'd. Harness already budgets long tool calls — no value in a second shorter cap here.
  • git commit + git push now call runCommand with wallTimeoutMs: 0, idleTimeoutMs: 0 (disabled, per runCommand's docstring convention).
  • The CreatePR gadget's outer timeoutMs: 240000 is now 0, which llmist treats as "no outer timer" (if (timeoutMs && timeoutMs > 0) in its dispatch path).
  • Heartbeat stays default (30 s stderr pulse) so operators still see [git-push] still running (Ns) ticks.

Why

Three wall-clocks used to stack on one code path, any of which could fire first:

Layer Old cap Result
git commit subprocess wall 120 s / idle 60 s COMMIT FAILED at 1–2 min if pre-commit hook silent
git push subprocess wall 230 s / idle 90 s PUSH FAILED at 1.5 min if pre-push hook silent
CreatePR gadget 240 s outer Harness-invoked tool died at 4 min regardless

A recent agent session lost 10+ minutes looping on the 120 s push timeout when lefthook ran the full test suite. This PR lifts all three.

Scope (explicit)

In scope, per approved plan:

  • src/gadgets/github/core/createPR.ts — drop 4 timeout constants, pass 0/0 on both runCommand calls
  • src/gadgets/github/definitions.ts — set outer timeoutMs: 0

Out of scope (deliberately not touched):

  • runCommand DEFAULT_IDLE_TIMEOUT_MS / DEFAULT_WALL_TIMEOUT_MS — other callers (agent worker repo fetch/checkout/reset, setup.sh, npm install, tsc) still want defaults to catch genuine hangs
  • Every other timeout in the inventory: HTTP clients, watchdogs, VerifyChanges, diagnostics tsc/biome, tmux, Sleep, router workerTimeoutMs. Each guards a real hang case.

Test coverage

Added:

  • 2 tests in `tests/unit/utils/repo.test.ts` pinning that `wallTimeoutMs: 0` and `idleTimeoutMs: 0` actually disable the kill path (10 min of silence / 10 min of continuous output → no SIGTERM). Previously the disabled-path was undocumented by tests; only the enabled path was covered.
  • 3 regression tests in `tests/unit/gadgets/github/core/createPR.test.ts` pinning that `git commit` + `git push` pass `wallTimeoutMs: 0, idleTimeoutMs: 0`. A future author who "just adds a safety cap" hits a red test with context.

Updated:

  • `definitions.test.ts` "every definition has a timeoutMs > 0" → "every definition has a non-negative timeoutMs (0 is the disabled sentinel)", with rationale for when 0 is appropriate.
  • `definitions.test.ts` "CreatePR has a 4-minute timeout" → "CreatePR has timeoutMs disabled (0) — hooks may run for minutes".
  • Deleted the three outdated spec-013 contract tests that asserted the old per-caller timeouts were finite. Kept the captured-output assertions from that block — those still pin that pre-commit / pre-push hook output is surfaced.

Totals: 6400 unit tests pass (up from 6398), typecheck clean, lint clean (0 warnings).

Test plan

  • `npm run typecheck` — clean
  • `npm run lint` — clean, 0 warnings
  • `npx vitest run --project unit-core --project unit-api` — 6400 pass, 23 skipped
  • Manual smoke: rig `lefthook.yml` pre-push to `sleep 180`, run `cascade-tools scm create-pr`, observe heartbeat continues past 90 s / 230 s and push completes cleanly. (Owner can verify on satellite after merge.)

Rollback

Single commit. Revert restores the four subprocess timeout constants and the gadget `timeoutMs: 240000`.

🤖 Generated with Claude Code

… hooks survive

Three stacked wall-clocks were killing `cascade-tools scm create-pr` whenever a
project's pre-push hook ran a real test suite:

1. `git commit` runCommand — wallTimeoutMs 120s, idleTimeoutMs 60s
2. `git push`   runCommand — wallTimeoutMs 230s, idleTimeoutMs 90s
3. CreatePR gadget — outer timeoutMs 240s

A 5-minute pre-push test suite is a legitimate use case. The agent harness that
wraps this gadget already budgets long tool calls; the second shorter cap here
was just re-creating the "PUSH FAILED at 2 min" failure mode of spec 013.

Fix: pass `wallTimeoutMs: 0, idleTimeoutMs: 0` to both runCommand calls
(runCommand treats 0 as "disabled", per its docstring at utils/repo.ts:75-86
and the enforced branches at :153,164,169). Set the CreatePR gadget's outer
`timeoutMs: 0`, which llmist treats as "no outer timer" (see
`if (timeoutMs && timeoutMs > 0)` in its dispatch path).

Heartbeat stays on (default 30s stderr pulse) so operators still see
`[git-push] still running (Ns)` ticks during slow hooks — observability
without killing.

Test coverage:
 - Two new `runCommand` tests pin the disabled-idle and disabled-wall paths
   (tests/unit/utils/repo.test.ts)
 - Three new regression tests pin that createPR passes 0/0 for both git
   commands (tests/unit/gadgets/github/core/createPR.test.ts)
 - The definitions-level assertion was flipped from "240s" to "0" and now
   documents why 0 is the correct value
 - Outdated spec-013 contract tests that asserted the old "finite" timeouts
   were deleted; the captured-output assertions from that block were kept

Not touched: runCommand DEFAULT_IDLE_TIMEOUT_MS / DEFAULT_WALL_TIMEOUT_MS, and
every other timeout in the inventory (HTTP clients, watchdogs, VerifyChanges,
diagnostics tsc/biome, tmux, Sleep). Those guard genuine hang cases and were
explicitly out of scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zbigniewsobiecki zbigniewsobiecki merged commit 4732477 into dev Apr 24, 2026
8 checks passed
@zbigniewsobiecki zbigniewsobiecki deleted the chore/remove-createpr-timeouts branch April 24, 2026 12:52
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant