Skip to content

UnifiedExec appears to inflate new uncached prompt suffixes: verbose exec wrappers and repeated session-command polling burn usage quickly #14750

@haydenkwak

Description

@haydenkwak

What version of Codex CLI is running?

codex-cli 0.114.0

What subscription do you have?

Pro

Which model were you using?

Model independent

What platform is your computer?

Darwin 25.2.0 arm64 arm

What terminal emulator and version are you using (if applicable)?

iTerm2, WezTerm, Terminal.app

What issue are you seeing?

I’m investigating a real usage-burn problem, not just a UI wording issue.

After rechecking both related issues and the local code paths, the strongest claim I can support is: large newly injected prompt payloads are a real burn mechanism, and on the UnifiedExec path one confirmed contributor is oversized new uncached suffix growth.

In my reproduced session, caching was mostly present, so this is not best described as a general cache-accounting failure:

  • mean cached/input ratio: 0.9455
  • median cached/input ratio: 0.98
  • max input event: 121,721
  • cached input on that event: 120,832

However, turn-level analysis still showed:

  • max non-cached turn: 69,957
  • max non-cached + output turn: 70,218

So there is a real burn path even when caching is largely present.

Looking at the code, core/src/tools/context.rs currently sends ExecCommandToolOutput::response_text() directly into the model-visible tool transcript. That text includes wrapper fields like:

  • Command
  • Chunk ID
  • Wall time
  • Process exited/running
  • Original token count
  • Output:

Also, core/src/unified_exec/process_manager.rs appears to attach session_command: Some(session_command.clone()) on write_stdin() poll responses as well. That means long interactive sessions can keep re-echoing the same command string.

So the problem does not look like “prefix cache is always broken.” It looks more like the new uncached suffix is being made larger than necessary by model-visible wrapper text and repeated session-command echoes.

This also seems consistent with the broader issue pattern in the #13000 ~ #15000 range:

  • #13568 usage dropping too quickly
  • #14349 rate limits dropping extremely fast
  • #14593 tokens still burning very fast after an extension update
  • #14681 /review draining usage unexpectedly

And there is one related mechanism issue in the same range:

  • #14507 points out that large up-front tool injection can burn many tokens in tool-heavy setups

There is also an important measurement confounder:

  • #14489 shows that TokenCount can re-emit prior last_token_usage on rate-limit-only updates, so some JSONL/local counters can over-report usage

So my best current reading is:

  • some local counters may over-report
  • but there is still a real usage-burn path
  • one confirmed local mechanism is oversized new tool-output suffixes on the UnifiedExec path

(I am intentionally not claiming that every fast-usage report has exactly one root cause, and I am not blaming the older JSON-vs-freeform formatting change by itself.)

What steps can reproduce the bug?

  1. Use Codex CLI on macOS in a normal repo-inspection workflow.
  2. Ask a few questions that trigger broad read-only exploration or interactive shell follow-ups, for example:
    • rg -n ...
    • sed -n 'a,bp' ...
    • cat ...
    • interactive exec_command followed by repeated write_stdin polling
  3. Let the session accumulate several large tool outputs.
  4. Observe usage climbing unexpectedly fast.

In my reproduced session:

  • max input event: 121,721
  • cached input on that event: 120,832
  • max non-cached turn: 69,957
  • max non-cached + output turn: 70,218
  • the worst turn was dominated by large read-only tool outputs

Thread id: 019cf1b1-dee1-76f3-97e7-e872d1c0c826

What is the expected behavior?

Ordinary repo-inspection and interactive shell workflows should be much more resistant to large first-pass prompt growth.

Large read-only or interactive tool outputs should not so easily turn into 50k-70k scale real-usage turns in normal workflows.

In particular:

  • the model-visible UnifiedExec payload should be more compact than the current human-readable wrapper text
  • repeated polling should not keep re-sending low-value metadata or duplicate command strings if they are not needed for the model’s next action

Additional information

Related issues in the same range:

  • #13568
  • #14349
  • #14593
  • #14681
  • #14507
  • #14489

Possible code areas worth inspecting:

  • core/src/tools/context.rs
    • consider separating model-visible exec payloads from human/log/telemetry formatting
    • consider using a smaller model-facing payload than response_text()
  • core/src/unified_exec/process_manager.rs
    • consider avoiding repeated session_command echoes on write_stdin() poll responses
  • core/src/tools/spec.rs
    • consider aligning unified_exec_output_schema() with a smaller model-facing contract
    • consider keeping dynamic tool ordering stable to reduce avoidable request-shape drift
  • core/src/stream_events_utils.rs and protocol/src/models.rs
    • if MCP is used, consider avoiding paths where large structured outputs are flattened back into large text payloads

Local code points I checked:

  • core/src/tools/context.rs
  • core/src/unified_exec/process_manager.rs
  • core/src/tools/spec.rs
  • core/src/client.rs
  • core/src/stream_events_utils.rs
  • protocol/src/models.rs

Why I’m filing this with this framing: The issue range above contains both symptom reports and a measurement confounder. My goal is to isolate a concrete mechanism that can generate real usage burn even when caching is mostly in place: oversized new tool-output suffixes along the UnifiedExec path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingrate-limitsIssues related to rate limits, quotas, and token usage reportingtool-callsIssues related to tool calling

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions