[rollout_trace] Trace tool and code-mode boundaries#18878
[rollout_trace] Trace tool and code-mode boundaries#18878cassirer-openai merged 1 commit intomainfrom
Conversation
3844251 to
582bf74
Compare
d22b579 to
47d822c
Compare
jif-oai
left a comment
There was a problem hiding this comment.
Same comments as for the previous PR
| } | ||
|
|
||
| #[derive(Debug, PartialEq)] | ||
| pub enum WaitResponse { |
There was a problem hiding this comment.
I don't have enough context to review but it looks sus to add this in code-mode
There was a problem hiding this comment.
There is an explicit tool call wait where you wait on a code cell to complete/write outputs. It is important that we capture this one so that we can track which code cell the main loop is being blocked on and what it yielded back to the model.
| /// affect the trace lifecycle. Keeping the trace eligibility and event writes | ||
| /// behind this helper makes those paths say what happened instead of repeating | ||
| /// the Direct/CodeMode/JsRepl/first-class-object policy at each branch. | ||
| struct DispatchTrace { |
There was a problem hiding this comment.
It becomes very awkward to move this into the tracing module since it would either introduce heavy dependencies on ToolInvocation and AnyToolHandler or force use to define awkward generic types inside of the module. All the heavy lifting is already handled by the RolloutTraceRecorder. This class only does the minimal wiring.
There was a problem hiding this comment.
I'm going to move as much as I can.
There was a problem hiding this comment.
I mean this should move in a dedicated file. This has nothing to do with the registry
There was a problem hiding this comment.
And ideally most of them become some impl From<>...
There was a problem hiding this comment.
This is addressed in spirit by moving the dispatch adapter out of registry.rs; the remaining conversions are now isolated in tool_dispatch_trace.rs. I don’t think impl From is clearly better for all of them: the main invocation conversion is intentionally optional because JsRepl should not create a dispatch trace, and the result conversion depends on both source and output formatting. ToolPayload -> ToolDispatchPayload is the one plausible From candidate, but I’d prefer not to add that unless we see it materially improves the clone/ownership cleanup.
## Summary Adds the standalone `codex-rollout-trace` crate, which defines the raw trace event format, replay/reduction model, writer, and reducer logic for reconstructing model-visible conversation/runtime state from recorded rollout data. The crate-level design is documented in [`codex-rs/rollout-trace/README.md`](https://github.com/openai/codex/blob/codex/rollout-trace-crate/codex-rs/rollout-trace/README.md). ## Stack This is PR 1/5 in the rollout trace stack. - [#18876](#18876): Add rollout trace crate - [#18877](#18877): Record core session rollout traces - [#18878](#18878): Trace tool and code-mode boundaries - [#18879](#18879): Trace sessions and multi-agent edges - [#18880](#18880): Add debug trace reduction command ## Review Notes This PR intentionally does not wire tracing into live Codex execution. It establishes the data model and reducer contract first, with crate-local tests covering conversation reconstruction, compaction boundaries, tool/session edges, and code-cell lifecycle reduction. Later PRs emit into this model. The README is the best entry point for reviewing the intended trace format and reduction semantics before diving into the reducer modules.
582bf74 to
899bb99
Compare
## Summary Wires rollout trace recording into `codex-core` session and turn execution. This records the core model request/response, compaction, and session lifecycle boundaries needed for replay without yet tracing every nested runtime/tool boundary. ## Stack This is PR 2/5 in the rollout trace stack. - [#18876](#18876): Add rollout trace crate - [#18877](#18877): Record core session rollout traces - [#18878](#18878): Trace tool and code-mode boundaries - [#18879](#18879): Trace sessions and multi-agent edges - [#18880](#18880): Add debug trace reduction command ## Review Notes This layer is the first live integration point. The important review question is whether trace recording is isolated from normal session behavior: trace failures should not become user-visible execution failures, and recording should preserve the existing turn/session lifecycle semantics. The PR depends on the reducer/data model from the first stack entry and only introduces the core recorder surface that later PRs use for richer runtime and relationship events.
47d822c to
6a5ab49
Compare
| /// Code mode owns the per-cell runtime id. Hosts should preserve it for | ||
| /// provenance/debugging, but should still assign their own runtime tool call id | ||
| /// if their tool-call graph requires globally unique ids. | ||
| pub struct CodeModeToolInvocation { |
There was a problem hiding this comment.
Just to be sure, this does not exist anywhere else? This looks a bit like a sub-structure that should already be available
There was a problem hiding this comment.
It didn't really exist in a way that we can import it but I've reorganized the code a little bit so we don't have to repeat it like this.
| impl ToolDispatchPayload { | ||
| fn log_payload(&self) -> String { | ||
| match self { | ||
| ToolDispatchPayload::Function { arguments } => arguments.clone(), |
There was a problem hiding this comment.
Side comments but all the code contains tons of clone. For this kind of features with high throughput, this can have an impact on latency
There was a problem hiding this comment.
Yeah you are right. I've taken a pass over the code and cleaned/avoided all clones I could and also changed the API/implementation slightly so that the construction of larger objects (mainly raw requests) never happen when the tracer is disabled.
## Summary Adds the standalone `codex-rollout-trace` crate, which defines the raw trace event format, replay/reduction model, writer, and reducer logic for reconstructing model-visible conversation/runtime state from recorded rollout data. The crate-level design is documented in [`codex-rs/rollout-trace/README.md`](https://github.com/openai/codex/blob/codex/rollout-trace-crate/codex-rs/rollout-trace/README.md). ## Stack This is PR 1/5 in the rollout trace stack. - [openai#18876](openai#18876): Add rollout trace crate - [openai#18877](openai#18877): Record core session rollout traces - [openai#18878](openai#18878): Trace tool and code-mode boundaries - [openai#18879](openai#18879): Trace sessions and multi-agent edges - [openai#18880](openai#18880): Add debug trace reduction command ## Review Notes This PR intentionally does not wire tracing into live Codex execution. It establishes the data model and reducer contract first, with crate-local tests covering conversation reconstruction, compaction boundaries, tool/session edges, and code-cell lifecycle reduction. Later PRs emit into this model. The README is the best entry point for reviewing the intended trace format and reduction semantics before diving into the reducer modules.
## Summary Wires rollout trace recording into `codex-core` session and turn execution. This records the core model request/response, compaction, and session lifecycle boundaries needed for replay without yet tracing every nested runtime/tool boundary. ## Stack This is PR 2/5 in the rollout trace stack. - [openai#18876](openai#18876): Add rollout trace crate - [openai#18877](openai#18877): Record core session rollout traces - [openai#18878](openai#18878): Trace tool and code-mode boundaries - [openai#18879](openai#18879): Trace sessions and multi-agent edges - [openai#18880](openai#18880): Add debug trace reduction command ## Review Notes This layer is the first live integration point. The important review question is whether trace recording is isolated from normal session behavior: trace failures should not become user-visible execution failures, and recording should preserve the existing turn/session lifecycle semantics. The PR depends on the reducer/data model from the first stack entry and only introduces the core recorder surface that later PRs use for richer runtime and relationship events.
b131784 to
f1de340
Compare
f1de340 to
3aaacfc
Compare
Summary
Extends rollout tracing across tool dispatch and code-mode runtime boundaries. This records canonical tool-call lifecycle events and links code-mode execution/wait operations back to the model-visible calls that caused them.
Stack
This is PR 3/5 in the rollout trace stack.
Review Notes
This PR is about attribution. Reviewers should focus on whether direct tool calls, code-mode-originated tool calls, waits, outputs, and cancellation boundaries are recorded with enough source information for deterministic reduction without coupling the reducer to live runtime internals.
The stack remains valid after this layer: tool and code-mode traces reduce through the existing crate model, while the broader session and multi-agent relationships are added in the next PR.