Summary
runTrackedJob in plugins/opencode/scripts/lib/tracked-jobs.mjs:64 wraps the runner promise with no wall-clock guard:
const result = await runner({ report, log });
If the runner stalls (dropped SSE, unresponsive provider after the HTTP call returns, an exception inside getSessionDiff, a wedged subsequent API call), the await never resolves and nothing writes a terminal status for the job. The job file stays status: "running" forever until SessionEnd reaps it — and SessionEnd only reaps jobs whose PID is already dead.
Partial mitigation already exists: sendPrompt in lib/opencode-server.mjs:195 wraps its fetch in AbortSignal.timeout(600_000), so the big inference call has a 10-minute cap. But everything outside that single fetch (the diff fetch in handleTask, JSON parsing, result-file writes) is unguarded.
Suggested fix
- Add
taskTimeoutMs option to runTrackedJob (default ~15 min, configurable via env or setup state). On expiry, write failed + phase: "timeout" and reject.
- Optional: idle-watchdog on
report() calls — if no progress for N seconds, fail.
Upstream reference
Derived from openai/codex-plugin-cc#183 — same root-cause pattern.
Summary
runTrackedJobinplugins/opencode/scripts/lib/tracked-jobs.mjs:64wraps the runner promise with no wall-clock guard:If the runner stalls (dropped SSE, unresponsive provider after the HTTP call returns, an exception inside
getSessionDiff, a wedged subsequent API call), the await never resolves and nothing writes a terminal status for the job. The job file staysstatus: "running"forever untilSessionEndreaps it — andSessionEndonly reaps jobs whose PID is already dead.Partial mitigation already exists:
sendPromptinlib/opencode-server.mjs:195wraps itsfetchinAbortSignal.timeout(600_000), so the big inference call has a 10-minute cap. But everything outside that single fetch (the diff fetch inhandleTask, JSON parsing, result-file writes) is unguarded.Suggested fix
taskTimeoutMsoption torunTrackedJob(default ~15 min, configurable via env or setup state). On expiry, writefailed+phase: "timeout"and reject.report()calls — if no progress for N seconds, fail.Upstream reference
Derived from openai/codex-plugin-cc#183 — same root-cause pattern.