Skip to content

Dead-PID and zombie job reconciliation #14

@JohnnyVicious

Description

@JohnnyVicious

Summary

When a tracked job's worker process dies between SessionStart and SessionEnd (crash, OOM, SIGKILL, parent detached), the job record stays status: "running" for the rest of the session. buildStatusSnapshot and enrichJob in plugins/opencode/scripts/lib/job-control.mjs never probe process.kill(pid, 0) — they only read the stored status field.

Cleanup today only happens in plugins/opencode/scripts/session-lifecycle-hook.mjs:27-44, which runs on SessionEnd. During the session itself, /opencode:status and /opencode:result can display rows whose PIDs are long dead.

Repro

  1. Start a background task via /opencode:rescue --background.
  2. kill -9 the background worker PID.
  3. /opencode:status still shows the job as running.

Suggested fix

In enrichJob (or buildStatusSnapshot), for any job.status === "running" with a job.pid, probe with process.kill(pid, 0). On ESRCH, downgrade to failed + errorMessage: "Worker process exited without updating job status." and persist via upsertJob.

Upstream reference

Derived from openai/codex-plugin-cc#164.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions