Skip to content

[rollout_trace] Trace tool and code-mode boundaries#18878

Merged
cassirer-openai merged 1 commit intomainfrom
codex/rollout-trace-tool-code-mode
Apr 23, 2026
Merged

[rollout_trace] Trace tool and code-mode boundaries#18878
cassirer-openai merged 1 commit intomainfrom
codex/rollout-trace-tool-code-mode

Conversation

@cassirer-openai
Copy link
Copy Markdown
Contributor

@cassirer-openai cassirer-openai commented Apr 21, 2026

Summary

Extends rollout tracing across tool dispatch and code-mode runtime boundaries. This records canonical tool-call lifecycle events and links code-mode execution/wait operations back to the model-visible calls that caused them.

Stack

This is PR 3/5 in the rollout trace stack.

  • #18876: Add rollout trace crate
  • #18877: Record core session rollout traces
  • #18878: Trace tool and code-mode boundaries
  • #18879: Trace sessions and multi-agent edges
  • #18880: Add debug trace reduction command

Review Notes

This PR is about attribution. Reviewers should focus on whether direct tool calls, code-mode-originated tool calls, waits, outputs, and cancellation boundaries are recorded with enough source information for deterministic reduction without coupling the reducer to live runtime internals.

The stack remains valid after this layer: tool and code-mode traces reduce through the existing crate model, while the broader session and multi-agent relationships are added in the next PR.

Copy link
Copy Markdown
Collaborator

@jif-oai jif-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as for the previous PR

Comment thread codex-rs/code-mode/src/runtime/mod.rs Outdated
}

#[derive(Debug, PartialEq)]
pub enum WaitResponse {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have enough context to review but it looks sus to add this in code-mode

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an explicit tool call wait where you wait on a code cell to complete/write outputs. It is important that we capture this one so that we can track which code cell the main loop is being blocked on and what it yielded back to the model.

Comment thread codex-rs/core/src/tools/registry.rs Outdated
/// affect the trace lifecycle. Keeping the trace eligibility and event writes
/// behind this helper makes those paths say what happened instead of repeating
/// the Direct/CodeMode/JsRepl/first-class-object policy at each branch.
struct DispatchTrace {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should move

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It becomes very awkward to move this into the tracing module since it would either introduce heavy dependencies on ToolInvocation and AnyToolHandler or force use to define awkward generic types inside of the module. All the heavy lifting is already handled by the RolloutTraceRecorder. This class only does the minimal wiring.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to move as much as I can.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this should move in a dedicated file. This has nothing to do with the registry

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And ideally most of them become some impl From<>...

Copy link
Copy Markdown
Contributor Author

@cassirer-openai cassirer-openai Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is addressed in spirit by moving the dispatch adapter out of registry.rs; the remaining conversions are now isolated in tool_dispatch_trace.rs. I don’t think impl From is clearly better for all of them: the main invocation conversion is intentionally optional because JsRepl should not create a dispatch trace, and the result conversion depends on both source and output formatting. ToolPayload -> ToolDispatchPayload is the one plausible From candidate, but I’d prefer not to add that unless we see it materially improves the clone/ownership cleanup.

Comment thread codex-rs/core/src/tools/registry.rs Outdated
Comment thread codex-rs/core/src/rollout_trace.rs Outdated
Comment thread codex-rs/core/src/rollout_trace.rs Outdated
cassirer-openai added a commit that referenced this pull request Apr 21, 2026
## Summary

Adds the standalone `codex-rollout-trace` crate, which defines the raw
trace event format, replay/reduction model, writer, and reducer logic
for reconstructing model-visible conversation/runtime state from
recorded rollout data.

The crate-level design is documented in
[`codex-rs/rollout-trace/README.md`](https://github.com/openai/codex/blob/codex/rollout-trace-crate/codex-rs/rollout-trace/README.md).

## Stack

This is PR 1/5 in the rollout trace stack.

- [#18876](#18876): Add rollout
trace crate
- [#18877](#18877): Record core
session rollout traces
- [#18878](#18878): Trace tool and
code-mode boundaries
- [#18879](#18879): Trace sessions
and multi-agent edges
- [#18880](#18880): Add debug trace
reduction command

## Review Notes

This PR intentionally does not wire tracing into live Codex execution.
It establishes the data model and reducer contract first, with
crate-local tests covering conversation reconstruction, compaction
boundaries, tool/session edges, and code-cell lifecycle reduction. Later
PRs emit into this model.

The README is the best entry point for reviewing the intended trace
format and reduction semantics before diving into the reducer modules.
@cassirer-openai cassirer-openai force-pushed the codex/rollout-trace-core-recorder branch from 582bf74 to 899bb99 Compare April 21, 2026 22:40
cassirer-openai added a commit that referenced this pull request Apr 22, 2026
## Summary

Wires rollout trace recording into `codex-core` session and turn
execution. This records the core model request/response, compaction, and
session lifecycle boundaries needed for replay without yet tracing every
nested runtime/tool boundary.

## Stack

This is PR 2/5 in the rollout trace stack.

- [#18876](#18876): Add rollout
trace crate
- [#18877](#18877): Record core
session rollout traces
- [#18878](#18878): Trace tool and
code-mode boundaries
- [#18879](#18879): Trace sessions
and multi-agent edges
- [#18880](#18880): Add debug trace
reduction command

## Review Notes

This layer is the first live integration point. The important review
question is whether trace recording is isolated from normal session
behavior: trace failures should not become user-visible execution
failures, and recording should preserve the existing turn/session
lifecycle semantics.

The PR depends on the reducer/data model from the first stack entry and
only introduces the core recorder surface that later PRs use for richer
runtime and relationship events.
Base automatically changed from codex/rollout-trace-core-recorder to main April 22, 2026 17:00
@cassirer-openai cassirer-openai force-pushed the codex/rollout-trace-tool-code-mode branch from 47d822c to 6a5ab49 Compare April 22, 2026 18:27
@cassirer-openai cassirer-openai enabled auto-merge (squash) April 22, 2026 19:27
Comment thread codex-rs/code-mode/src/runtime/mod.rs Outdated
Comment thread codex-rs/code-mode/src/service.rs Outdated
/// Code mode owns the per-cell runtime id. Hosts should preserve it for
/// provenance/debugging, but should still assign their own runtime tool call id
/// if their tool-call graph requires globally unique ids.
pub struct CodeModeToolInvocation {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure, this does not exist anywhere else? This looks a bit like a sub-structure that should already be available

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't really exist in a way that we can import it but I've reorganized the code a little bit so we don't have to repeat it like this.

Comment thread codex-rs/code-mode/src/service.rs Outdated
Comment thread codex-rs/core/src/tools/code_mode/execute_handler.rs
Comment thread codex-rs/code-mode/src/service.rs Outdated
impl ToolDispatchPayload {
fn log_payload(&self) -> String {
match self {
ToolDispatchPayload::Function { arguments } => arguments.clone(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side comments but all the code contains tons of clone. For this kind of features with high throughput, this can have an impact on latency

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you are right. I've taken a pass over the code and cleaned/avoided all clones I could and also changed the API/implementation slightly so that the construction of larger objects (mainly raw requests) never happen when the tracer is disabled.

Comment thread codex-rs/core/src/tools/router.rs Outdated
Comment thread codex-rs/core/src/tools/registry.rs Outdated
Comment thread codex-rs/rollout-trace/src/tool_dispatch.rs
Comment thread codex-rs/core/src/tools/code_mode/execute_handler.rs Outdated
Comment thread codex-rs/core/src/tools/registry_tests.rs Outdated
morozow pushed a commit to morozow/codex that referenced this pull request Apr 23, 2026
## Summary

Adds the standalone `codex-rollout-trace` crate, which defines the raw
trace event format, replay/reduction model, writer, and reducer logic
for reconstructing model-visible conversation/runtime state from
recorded rollout data.

The crate-level design is documented in
[`codex-rs/rollout-trace/README.md`](https://github.com/openai/codex/blob/codex/rollout-trace-crate/codex-rs/rollout-trace/README.md).

## Stack

This is PR 1/5 in the rollout trace stack.

- [openai#18876](openai#18876): Add rollout
trace crate
- [openai#18877](openai#18877): Record core
session rollout traces
- [openai#18878](openai#18878): Trace tool and
code-mode boundaries
- [openai#18879](openai#18879): Trace sessions
and multi-agent edges
- [openai#18880](openai#18880): Add debug trace
reduction command

## Review Notes

This PR intentionally does not wire tracing into live Codex execution.
It establishes the data model and reducer contract first, with
crate-local tests covering conversation reconstruction, compaction
boundaries, tool/session edges, and code-cell lifecycle reduction. Later
PRs emit into this model.

The README is the best entry point for reviewing the intended trace
format and reduction semantics before diving into the reducer modules.
morozow pushed a commit to morozow/codex that referenced this pull request Apr 23, 2026
## Summary

Wires rollout trace recording into `codex-core` session and turn
execution. This records the core model request/response, compaction, and
session lifecycle boundaries needed for replay without yet tracing every
nested runtime/tool boundary.

## Stack

This is PR 2/5 in the rollout trace stack.

- [openai#18876](openai#18876): Add rollout
trace crate
- [openai#18877](openai#18877): Record core
session rollout traces
- [openai#18878](openai#18878): Trace tool and
code-mode boundaries
- [openai#18879](openai#18879): Trace sessions
and multi-agent edges
- [openai#18880](openai#18880): Add debug trace
reduction command

## Review Notes

This layer is the first live integration point. The important review
question is whether trace recording is isolated from normal session
behavior: trace failures should not become user-visible execution
failures, and recording should preserve the existing turn/session
lifecycle semantics.

The PR depends on the reducer/data model from the first stack entry and
only introduces the core recorder surface that later PRs use for richer
runtime and relationship events.
@cassirer-openai cassirer-openai force-pushed the codex/rollout-trace-tool-code-mode branch from b131784 to f1de340 Compare April 23, 2026 16:29
@cassirer-openai cassirer-openai force-pushed the codex/rollout-trace-tool-code-mode branch from f1de340 to 3aaacfc Compare April 23, 2026 16:53
@cassirer-openai cassirer-openai merged commit 6d09b67 into main Apr 23, 2026
25 checks passed
@cassirer-openai cassirer-openai deleted the codex/rollout-trace-tool-code-mode branch April 23, 2026 19:22
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 23, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants