What version of Codex CLI is running?
codex-cli 0.114.0
What subscription do you have?
Pro
Which model were you using?
Model independent
What platform is your computer?
Darwin 25.2.0 arm64 arm
What terminal emulator and version are you using (if applicable)?
iTerm2, WezTerm, Terminal.app
What issue are you seeing?
I’m investigating a real usage-burn problem, not just a UI wording issue.
After rechecking both related issues and the local code paths, the strongest claim I can support is: large newly injected prompt payloads are a real burn mechanism, and on the UnifiedExec path one confirmed contributor is oversized new uncached suffix growth.
In my reproduced session, caching was mostly present, so this is not best described as a general cache-accounting failure:
- mean cached/input ratio: 0.9455
- median cached/input ratio: 0.98
- max input event: 121,721
- cached input on that event: 120,832
However, turn-level analysis still showed:
- max non-cached turn: 69,957
- max non-cached + output turn: 70,218
So there is a real burn path even when caching is largely present.
Looking at the code, core/src/tools/context.rs currently sends ExecCommandToolOutput::response_text() directly into the model-visible tool transcript. That text includes wrapper fields like:
Command
Chunk ID
Wall time
Process exited/running
Original token count
Output:
Also, core/src/unified_exec/process_manager.rs appears to attach session_command: Some(session_command.clone()) on write_stdin() poll responses as well. That means long interactive sessions can keep re-echoing the same command string.
So the problem does not look like “prefix cache is always broken.” It looks more like the new uncached suffix is being made larger than necessary by model-visible wrapper text and repeated session-command echoes.
This also seems consistent with the broader issue pattern in the #13000 ~ #15000 range:
#13568 usage dropping too quickly
#14349 rate limits dropping extremely fast
#14593 tokens still burning very fast after an extension update
#14681 /review draining usage unexpectedly
And there is one related mechanism issue in the same range:
#14507 points out that large up-front tool injection can burn many tokens in tool-heavy setups
There is also an important measurement confounder:
#14489 shows that TokenCount can re-emit prior last_token_usage on rate-limit-only updates, so some JSONL/local counters can over-report usage
So my best current reading is:
- some local counters may over-report
- but there is still a real usage-burn path
- one confirmed local mechanism is oversized new tool-output suffixes on the UnifiedExec path
(I am intentionally not claiming that every fast-usage report has exactly one root cause, and I am not blaming the older JSON-vs-freeform formatting change by itself.)
What steps can reproduce the bug?
- Use Codex CLI on macOS in a normal repo-inspection workflow.
- Ask a few questions that trigger broad read-only exploration or interactive shell follow-ups, for example:
rg -n ...
sed -n 'a,bp' ...
cat ...
- interactive
exec_command followed by repeated write_stdin polling
- Let the session accumulate several large tool outputs.
- Observe usage climbing unexpectedly fast.
In my reproduced session:
- max input event: 121,721
- cached input on that event: 120,832
- max non-cached turn: 69,957
- max non-cached + output turn: 70,218
- the worst turn was dominated by large read-only tool outputs
Thread id: 019cf1b1-dee1-76f3-97e7-e872d1c0c826
What is the expected behavior?
Ordinary repo-inspection and interactive shell workflows should be much more resistant to large first-pass prompt growth.
Large read-only or interactive tool outputs should not so easily turn into 50k-70k scale real-usage turns in normal workflows.
In particular:
- the model-visible UnifiedExec payload should be more compact than the current human-readable wrapper text
- repeated polling should not keep re-sending low-value metadata or duplicate command strings if they are not needed for the model’s next action
Additional information
Related issues in the same range:
#13568
#14349
#14593
#14681
#14507
#14489
Possible code areas worth inspecting:
core/src/tools/context.rs
- consider separating model-visible exec payloads from human/log/telemetry formatting
- consider using a smaller model-facing payload than
response_text()
core/src/unified_exec/process_manager.rs
- consider avoiding repeated
session_command echoes on write_stdin() poll responses
core/src/tools/spec.rs
- consider aligning
unified_exec_output_schema() with a smaller model-facing contract
- consider keeping dynamic tool ordering stable to reduce avoidable request-shape drift
core/src/stream_events_utils.rs and protocol/src/models.rs
- if MCP is used, consider avoiding paths where large structured outputs are flattened back into large text payloads
Local code points I checked:
core/src/tools/context.rs
core/src/unified_exec/process_manager.rs
core/src/tools/spec.rs
core/src/client.rs
core/src/stream_events_utils.rs
protocol/src/models.rs
Why I’m filing this with this framing: The issue range above contains both symptom reports and a measurement confounder. My goal is to isolate a concrete mechanism that can generate real usage burn even when caching is mostly in place: oversized new tool-output suffixes along the UnifiedExec path.
What version of Codex CLI is running?
codex-cli 0.114.0
What subscription do you have?
Pro
Which model were you using?
Model independent
What platform is your computer?
Darwin 25.2.0 arm64 arm
What terminal emulator and version are you using (if applicable)?
iTerm2, WezTerm, Terminal.app
What issue are you seeing?
I’m investigating a real usage-burn problem, not just a UI wording issue.
After rechecking both related issues and the local code paths, the strongest claim I can support is: large newly injected prompt payloads are a real burn mechanism, and on the UnifiedExec path one confirmed contributor is oversized new uncached suffix growth.
In my reproduced session, caching was mostly present, so this is not best described as a general cache-accounting failure:
However, turn-level analysis still showed:
So there is a real burn path even when caching is largely present.
Looking at the code,
core/src/tools/context.rscurrently sendsExecCommandToolOutput::response_text()directly into the model-visible tool transcript. That text includes wrapper fields like:CommandChunk IDWall timeProcess exited/runningOriginal token countOutput:Also,
core/src/unified_exec/process_manager.rsappears to attachsession_command: Some(session_command.clone())onwrite_stdin()poll responses as well. That means long interactive sessions can keep re-echoing the same command string.So the problem does not look like “prefix cache is always broken.” It looks more like the new uncached suffix is being made larger than necessary by model-visible wrapper text and repeated session-command echoes.
This also seems consistent with the broader issue pattern in the
#13000 ~ #15000range:#13568usage dropping too quickly#14349rate limits dropping extremely fast#14593tokens still burning very fast after an extension update#14681/reviewdraining usage unexpectedlyAnd there is one related mechanism issue in the same range:
#14507points out that large up-front tool injection can burn many tokens in tool-heavy setupsThere is also an important measurement confounder:
#14489shows thatTokenCountcan re-emit priorlast_token_usageon rate-limit-only updates, so some JSONL/local counters can over-report usageSo my best current reading is:
(I am intentionally not claiming that every fast-usage report has exactly one root cause, and I am not blaming the older JSON-vs-freeform formatting change by itself.)
What steps can reproduce the bug?
rg -n ...sed -n 'a,bp' ...cat ...exec_commandfollowed by repeatedwrite_stdinpollingIn my reproduced session:
Thread id: 019cf1b1-dee1-76f3-97e7-e872d1c0c826
What is the expected behavior?
Ordinary repo-inspection and interactive shell workflows should be much more resistant to large first-pass prompt growth.
Large read-only or interactive tool outputs should not so easily turn into 50k-70k scale real-usage turns in normal workflows.
In particular:
Additional information
Related issues in the same range:
#13568#14349#14593#14681#14507#14489Possible code areas worth inspecting:
core/src/tools/context.rsresponse_text()core/src/unified_exec/process_manager.rssession_commandechoes onwrite_stdin()poll responsescore/src/tools/spec.rsunified_exec_output_schema()with a smaller model-facing contractcore/src/stream_events_utils.rsandprotocol/src/models.rsLocal code points I checked:
core/src/tools/context.rscore/src/unified_exec/process_manager.rscore/src/tools/spec.rscore/src/client.rscore/src/stream_events_utils.rsprotocol/src/models.rsWhy I’m filing this with this framing: The issue range above contains both symptom reports and a measurement confounder. My goal is to isolate a concrete mechanism that can generate real usage burn even when caching is mostly in place: oversized new tool-output suffixes along the UnifiedExec path.