Related
Reference implementation for the timeout half of #13 (runTrackedJob can hang indefinitely). File separately because the upstream PR has a clean, test-backed diff we can almost directly transplant.
Problem (restated)
plugins/opencode/scripts/lib/tracked-jobs.mjs:64 awaits the runner with no wall-clock guard. If the runner promise never settles (SSE stream dropped, fetch hangs after it started receiving bytes, post-response handlers wedge), the job file stays status: "running" until SessionEnd — and the lock persists until the user manually wipes state.
Fix from upstream (two-part)
Part 1 — hard timeout in runTrackedJob:
// lib/tracked-jobs.mjs additions
// Hard ceiling for any single tracked job. 30 minutes is generous for long
// OpenCode runs but bounded so a hung runner cannot keep the companion
// process alive forever. Override via OPENCODE_COMPANION_JOB_TIMEOUT_MS.
const DEFAULT_JOB_TIMEOUT_MS = 30 * 60 * 1000;
function resolveJobTimeoutMs(options = {}) {
if (Number.isFinite(options.timeoutMs) && options.timeoutMs > 0) {
return options.timeoutMs;
}
const fromEnv = Number(process.env.OPENCODE_COMPANION_JOB_TIMEOUT_MS);
if (Number.isFinite(fromEnv) && fromEnv > 0) {
return fromEnv;
}
return DEFAULT_JOB_TIMEOUT_MS;
}
export async function runTrackedJob(workspacePath, job, runner, options = {}) {
// ... existing setup ...
const timeoutMs = resolveJobTimeoutMs(options);
let timeoutHandle = null;
const timeoutPromise = new Promise((_resolve, reject) => {
timeoutHandle = setTimeout(() => {
reject(
new Error(
`Tracked job ${job.id} exceeded the ${Math.round(timeoutMs / 1000)}s hard timeout. ` +
"The runner did not produce a terminal status. " +
"Set OPENCODE_COMPANION_JOB_TIMEOUT_MS to adjust."
)
);
}, timeoutMs);
timeoutHandle.unref?.();
});
try {
const result = await Promise.race([runner({ report, log }), timeoutPromise]);
if (timeoutHandle) { clearTimeout(timeoutHandle); timeoutHandle = null; }
// ... existing completion path ...
return result;
} catch (err) {
if (timeoutHandle) { clearTimeout(timeoutHandle); timeoutHandle = null; }
// ... existing failure path ...
throw err;
}
}
Part 2 — already partially addressed: Upstream also needed to race captureTurn against the broker client's exitPromise because codex's broker could die silently mid-turn. Opencode's HTTP layer already caps the inference fetch at 10 minutes via AbortSignal.timeout(600_000) in lib/opencode-server.mjs:195, which handles the equivalent "upstream died" case for the body of the call. The risk remaining is anything outside that fetch — getSessionDiff in handleTask, result-file writes, JSON parsing of unexpectedly large bodies. The Part 1 hard timeout is the correct blanket guard for those.
Test plan (from upstream, adapted)
Create tests/tracked-jobs-timeout.test.mjs:
- A runner that never resolves is aborted after
timeoutMs and transitions the job to status: "failed" with an error message containing the timeout figure.
- A runner that resolves quickly completes normally and does not race the timeout (no stray
setTimeout handle keeps the event loop alive — assert the test process exits promptly).
- Env override:
OPENCODE_COMPANION_JOB_TIMEOUT_MS=500 makes the timeout fire at ~500ms.
Upstream reference
openai/codex-plugin-cc#184 (open) — closes their #183, refs their #176, #164. Tests included: tests/tracked-jobs-timeout.test.mjs, tests/dead-pid-reconcile.test.mjs, tests/process.test.mjs (3+6+2 = 11 new cases).
Port of openai/codex-plugin-cc#184 (open, high-quality reference impl)
Related
Reference implementation for the timeout half of #13 (runTrackedJob can hang indefinitely). File separately because the upstream PR has a clean, test-backed diff we can almost directly transplant.
Problem (restated)
plugins/opencode/scripts/lib/tracked-jobs.mjs:64awaits the runner with no wall-clock guard. If the runner promise never settles (SSE stream dropped, fetch hangs after it started receiving bytes, post-response handlers wedge), the job file staysstatus: "running"untilSessionEnd— and the lock persists until the user manually wipes state.Fix from upstream (two-part)
Part 1 — hard timeout in
runTrackedJob:Part 2 — already partially addressed: Upstream also needed to race
captureTurnagainst the broker client'sexitPromisebecause codex's broker could die silently mid-turn. Opencode's HTTP layer already caps the inference fetch at 10 minutes viaAbortSignal.timeout(600_000)inlib/opencode-server.mjs:195, which handles the equivalent "upstream died" case for the body of the call. The risk remaining is anything outside that fetch —getSessionDiffinhandleTask, result-file writes, JSON parsing of unexpectedly large bodies. The Part 1 hard timeout is the correct blanket guard for those.Test plan (from upstream, adapted)
Create
tests/tracked-jobs-timeout.test.mjs:timeoutMsand transitions the job tostatus: "failed"with an error message containing the timeout figure.setTimeouthandle keeps the event loop alive — assert the test process exits promptly).OPENCODE_COMPANION_JOB_TIMEOUT_MS=500makes the timeout fire at ~500ms.Upstream reference
openai/codex-plugin-cc#184 (open) — closes their #183, refs their #176, #164. Tests included:
tests/tracked-jobs-timeout.test.mjs,tests/dead-pid-reconcile.test.mjs,tests/process.test.mjs(3+6+2 = 11 new cases).