Skip to content

Codex.app GUI: MCP child processes not reaped after task completion — 1300+ zombies, 37GB memory leak #12491

@rolldav

Description

@rolldav

Codex.app GUI: MCP child processes not reaped after task completion — 1300+ zombies, 37GB memory leak

What version of Codex CLI is running?

codex-cli 0.98.0 (Codex.app GUI wrapper)

What subscription do you have?

ChatGPT Pro

Which model were you using?

o3 (via Codex.app GUI)

What platform is your computer?

macOS 26.3 (25D125), Apple M4 Max, 64GB RAM, arm64

What issue are you seeing?

Codex.app spawns codex exec --full-auto worker processes for each worktree but never reaps them or their MCP child processes when tasks complete. Over multiple audit sessions, this accumulated 1,319 zombie Codex processes and 1,537 node processes consuming ~37GB RSS + 40GB swap on a 64GB machine, causing severe system-wide slowdown.

Root cause analysis

Each Codex.app task creates:

  • codex exec --full-auto --skip-git-repo-check --json - (CLI worker)
  • oh-my-codex MCP servers (state, memory, trace, code-intel)
  • 1× Serena MCP server (via uv tool uvx)
  • uv process for Serena

≈ 7 processes per task. None are killed when the task completes.

With 12 git worktrees created by Codex.app (detached HEAD, pointing at old commits), this produced:

  • 12 worktrees × ~110 child processes = ~1,320 sleeping processes
  • Combined RSS: ~37GB (mostly npm/node/uv MCP servers)
  • Swap usage: 39.4GB (on 64GB physical RAM)

When Codex.app (PID 19947) was killed, 2 additional codex exec workers (launched Friday and Saturday) were reparented to launchd (PPID=1), confirming they were never properly cleaned up.

Process tree snapshot (before cleanup)

Codex.app (PID 19947, 279MB)
├── codex exec --full-auto (PID 19970, 123MB)
│   └── 880× node/npm/uv children (27GB total)
│       ├── npm (236MB each, ×2)
│       ├── npm (140MB each, ×4)
│       ├── uv (91MB each, ×2)
│       └── node (77MB each, ×dozens)
├── Codex Helper Renderer (PID 20018, 602MB)
└── Codex Helper (PID 19971, 101MB)

Orphaned (PPID=1, reparented from dead Codex.app sessions):
├── codex exec --full-auto (PID 59326, Fri 23:59)
│   ├── oh-my-codex/state-server.js
│   ├── oh-my-codex/memory-server.js
│   ├── oh-my-codex/trace-server.js
│   ├── oh-my-codex/code-intel-server.js
│   └── serena start-mcp-server
└── codex exec --full-auto (PID 59747, Sat 22:05)
    └── (same MCP stack)

Memory impact

Metric Before cleanup After cleanup Delta
Swap used 39.4 GB 8.0 GB -31 GB
Codex processes 1,319 0 -100%
Node processes 1,537 17 -99%
Free pages 82K 2.3M ×28

Worktree state at time of discovery

All 12 worktrees were detached HEAD on old commits (2+ phases behind current main), confirming tasks had completed long ago:

/Users/drg/.codex/worktrees/0585/medical-ai  627d44e (detached HEAD)
/Users/drg/.codex/worktrees/0b7b/medical-ai  627d44e (detached HEAD)
/Users/drg/.codex/worktrees/2a60/medical-ai  4d53e7d (detached HEAD)
... (12 total, all on 2 old commits)

Current HEAD: 71aefff — none of the worktrees were anywhere near current.

What steps can reproduce the bug?

  1. Use Codex.app GUI to run multiple tasks on a git repository (especially with MCP servers configured via oh-my-codex)
  2. Let tasks complete normally
  3. Wait a few hours / run more tasks
  4. Check process count: ps aux | grep -c codex
  5. Check memory: ps aux | grep codex | awk '{sum+=$6} END {printf "%.0f MB\n", sum/1024}'

Accumulation is proportional to number of tasks × MCP servers configured.

What is the expected behavior?

When a Codex.app task completes:

  1. The codex exec worker process should be terminated
  2. All MCP child processes (oh-my-codex servers, Serena, uv) should be killed via process group signal
  3. The git worktree should be cleaned up if the task is done (or at minimum, child processes should not persist)

What is the actual behavior?

  • Worker processes and their entire MCP subtree remain alive indefinitely
  • No cleanup occurs on task completion, app quit, or even app crash (reparented to launchd)
  • Memory grows unboundedly with each task

Relationship to existing issues

  • CLI-494 (fixed, Oct 2025): start_kill() without wait() in exec.rs — same root cause pattern (missing process reaping), but fix only covered timeout/Ctrl+C in CLI, not GUI lifecycle
  • CLI-3017 (Jan 2026): CLI memory leak causing crashes
  • #12414 (Feb 2026, open): Unbounded memory growth when idle on v0.104.0

This issue is distinct because it specifically affects Codex.app GUI and its interaction with MCP server child processes — the GUI spawns long-lived codex exec workers that themselves spawn MCP servers, creating a deeper process tree that the existing cleanup logic does not handle.

Suggested fix

  1. Process group kill: codex exec workers should use setsid() / process groups so all children (MCP servers) can be killed with a single killpg() on task completion
  2. Worktree lifecycle: Codex.app should track active worktrees and clean up stale ones on startup (e.g., remove worktrees whose tasks are no longer active)
  3. Heartbeat/watchdog: MCP servers spawned by codex exec should monitor their parent and self-terminate if the parent dies (check PPID periodically, or use a pipe/fd that breaks on parent death)

Metadata

Metadata

Assignees

No one assigned

    Labels

    appIssues related to the Codex desktop appbugSomething isn't workingmcpIssues related to the use of model context protocol (MCP) servers

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions