Port: tracked-job hard timeout + exit-promise race (ref impl for #13)

## Port of openai/codex-plugin-cc#184 (open, high-quality reference impl)

### Related

Reference implementation for the timeout half of #13 (runTrackedJob can hang indefinitely). File separately because the upstream PR has a clean, test-backed diff we can almost directly transplant.

### Problem (restated)

`plugins/opencode/scripts/lib/tracked-jobs.mjs:64` awaits the runner with no wall-clock guard. If the runner promise never settles (SSE stream dropped, fetch hangs after it started receiving bytes, post-response handlers wedge), the job file stays `status: "running"` until `SessionEnd` — and the lock persists until the user manually wipes state.

### Fix from upstream (two-part)

**Part 1 — hard timeout in `runTrackedJob`:**

```js
// lib/tracked-jobs.mjs additions

// Hard ceiling for any single tracked job. 30 minutes is generous for long
// OpenCode runs but bounded so a hung runner cannot keep the companion
// process alive forever. Override via OPENCODE_COMPANION_JOB_TIMEOUT_MS.
const DEFAULT_JOB_TIMEOUT_MS = 30 * 60 * 1000;

function resolveJobTimeoutMs(options = {}) {
  if (Number.isFinite(options.timeoutMs) && options.timeoutMs > 0) {
    return options.timeoutMs;
  }
  const fromEnv = Number(process.env.OPENCODE_COMPANION_JOB_TIMEOUT_MS);
  if (Number.isFinite(fromEnv) && fromEnv > 0) {
    return fromEnv;
  }
  return DEFAULT_JOB_TIMEOUT_MS;
}

export async function runTrackedJob(workspacePath, job, runner, options = {}) {
  // ... existing setup ...

  const timeoutMs = resolveJobTimeoutMs(options);
  let timeoutHandle = null;
  const timeoutPromise = new Promise((_resolve, reject) => {
    timeoutHandle = setTimeout(() => {
      reject(
        new Error(
          `Tracked job ${job.id} exceeded the ${Math.round(timeoutMs / 1000)}s hard timeout. ` +
            "The runner did not produce a terminal status. " +
            "Set OPENCODE_COMPANION_JOB_TIMEOUT_MS to adjust."
        )
      );
    }, timeoutMs);
    timeoutHandle.unref?.();
  });

  try {
    const result = await Promise.race([runner({ report, log }), timeoutPromise]);
    if (timeoutHandle) { clearTimeout(timeoutHandle); timeoutHandle = null; }
    // ... existing completion path ...
    return result;
  } catch (err) {
    if (timeoutHandle) { clearTimeout(timeoutHandle); timeoutHandle = null; }
    // ... existing failure path ...
    throw err;
  }
}
```

**Part 2 — already partially addressed**: Upstream also needed to race `captureTurn` against the broker client's `exitPromise` because codex's broker could die silently mid-turn. Opencode's HTTP layer already caps the inference fetch at 10 minutes via `AbortSignal.timeout(600_000)` in `lib/opencode-server.mjs:195`, which handles the equivalent "upstream died" case for the body of the call. The risk remaining is anything **outside** that fetch — `getSessionDiff` in `handleTask`, result-file writes, JSON parsing of unexpectedly large bodies. The Part 1 hard timeout is the correct blanket guard for those.

### Test plan (from upstream, adapted)

Create `tests/tracked-jobs-timeout.test.mjs`:

1. A runner that never resolves is aborted after `timeoutMs` and transitions the job to `status: "failed"` with an error message containing the timeout figure.
2. A runner that resolves quickly completes normally and does **not** race the timeout (no stray `setTimeout` handle keeps the event loop alive — assert the test process exits promptly).
3. Env override: `OPENCODE_COMPANION_JOB_TIMEOUT_MS=500` makes the timeout fire at ~500ms.

### Upstream reference

openai/codex-plugin-cc#184 (open) — closes their #183, refs their #176, #164. Tests included: `tests/tracked-jobs-timeout.test.mjs`, `tests/dead-pid-reconcile.test.mjs`, `tests/process.test.mjs` (3+6+2 = 11 new cases).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port: tracked-job hard timeout + exit-promise race (ref impl for #13) #41

Port of openai/codex-plugin-cc#184 (open, high-quality reference impl)

Related

Problem (restated)

Fix from upstream (two-part)

Test plan (from upstream, adapted)

Upstream reference

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Port: tracked-job hard timeout + exit-promise race (ref impl for #13) #41

Description

Port of openai/codex-plugin-cc#184 (open, high-quality reference impl)

Related

Problem (restated)

Fix from upstream (two-part)

Test plan (from upstream, adapted)

Upstream reference

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions