Skip to content

fix(codex): normalize LLM call logging to per-turn boundaries#841

Merged
aaight merged 1 commit intodevfrom
feature/codex-turn-scoped-llm-logging
Mar 14, 2026
Merged

fix(codex): normalize LLM call logging to per-turn boundaries#841
aaight merged 1 commit intodevfrom
feature/codex-turn-scoped-llm-logging

Conversation

@aaight
Copy link
Copy Markdown
Collaborator

@aaight aaight commented Mar 14, 2026

Summary

  • Root cause: Codex logged one storeLlmCall row per usage-bearing event (e.g. response.completed), instead of one row per completed turn — resulting in a single giant aggregate call or duplicate rows per run
  • Fix: Introduces a turn-scoped accumulator (CodexTurnAccumulator) that collects text, tool names, and usage across JSONL events within a turn; persistence fires exactly once on turn.completed
  • Schema unchanged: No database migrations needed; existing agent_run_llm_calls and runs API/UI contract is preserved

Key changes

src/backends/codex/index.ts

  • Added CodexTurnAccumulator type to CodexLineContext to track in-progress turn state
  • accumulateTurnUsage(): merges intermediate usage events (e.g. response.completed) into the accumulator without persisting
  • persistTurnLlmCall(): called exactly once on turn.completed; writes a compact, turn-scoped payload (text summary + tool names + usage) rather than a raw event dump
  • handleStructuralEvent(): turn.started/thread.started reset the accumulator; turn.completed accumulates any final usage then persists
  • handleParsedLine(): intermediate usage events now call accumulateTurnUsage() instead of trackUsage() (which is removed)
  • Usage ordering: turn.completed aggregate totals supersede intermediate response.completed per-response values

tests/unit/backends/codex.test.ts

  • Updated 2 existing tests to emit proper turn.started + turn.completed lifecycle events (reflecting correct behavior)
  • Added 4 new tests:
    • Multi-turn stream: verifies 2 turns produce 2 storeLlmCall rows with stable sequential call numbers
    • Duplicate-usage prevention: both response.completed and turn.completed carry usage — exactly 1 row stored
    • Compact payload shape: stored payload contains turn, tools, usage and is < 2 KB
    • No-turn-completed: bare usage events without turn.completed produce no rows

Test plan

  • All 63 Codex unit tests pass
  • All 9 shared-llmCallLogger tests pass
  • TypeScript type check passes (zero errors)
  • Biome lint passes (zero errors)
  • Pre-existing test failures in unrelated files (postComment, github) confirmed pre-existing on base branch

Card: https://trello.com/c/ueUZYl4d/359-it-looks-like-for-codex-runs-we-store-only-one-gigantic-llm-call-in-our-agent-runs-llm-calls-lets-look-how-it-looks-with-other-e

🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

🕵️ claude-code · claude-sonnet-4-6 · run details

Copy link
Copy Markdown
Collaborator

@nhopeatall nhopeatall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

LGTM — Clean refactor that introduces a turn-scoped accumulator to normalize Codex LLM call logging from one-row-per-usage-event to one-row-per-completed-turn, with solid test coverage for the key scenarios.

The implementation correctly:

  • Initializes the accumulator both at context creation (for streams that lack turn.started) and on turn.started/thread.started events
  • Uses "last value wins" override semantics in accumulateTurnUsage, so turn.completed aggregate totals correctly supersede intermediate response.completed per-response values
  • Persists exactly once per turn boundary and resets the accumulator afterward
  • Removes the raw JSONL line from the stored payload in favor of a compact, structured turn summary (text + tools + usage)
  • Preserves the existing logLlmCall contract with llmCallLogger.ts — no downstream schema or API changes needed

The 4 new tests cover the critical scenarios well: multi-turn sequencing, duplicate-usage prevention, compact payload shape, and the no-turn-completed edge case. CI passes cleanly.

🕵️ claude-code · claude-opus-4-6 · run details

@aaight aaight merged commit 4442678 into dev Mar 14, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants