Skip to content

core: cut codex-core compile time 63% with native async ToolHandler#16630

Merged
bolinfest merged 1 commit intomainfrom
pr16630
Apr 2, 2026
Merged

core: cut codex-core compile time 63% with native async ToolHandler#16630
bolinfest merged 1 commit intomainfrom
pr16630

Conversation

@bolinfest
Copy link
Copy Markdown
Collaborator

@bolinfest bolinfest commented Apr 2, 2026

Why

ToolHandler was still paying a large compile-time tax from #[async_trait] on every concrete handler impl, even though the only object-safe boundary the registry actually stores is the internal AnyToolHandler adapter.

This PR removes that macro-generated async wrapper layer from concrete ToolHandler impls while keeping the existing object-safe shim in AnyToolHandler. In practice, that gets essentially the same compile-time win as the larger type-erasure refactor in #16627, but with a much smaller diff and without changing the public shape of ToolHandler<Output = T>.

That tradeoff matters here because this is a broad codex-core hotspot and reviewers should be able to judge the compile-time impact from hard numbers, not vibes.

Headline result

On a clean codex-core package rebuild (cargo clean -p codex-core before each command), rustc total dropped from 187.15s to 68.98s versus the shared 0bd31dc382bd baseline: -63.1%.

The biggest hot passes dropped by roughly 71-72%:

Metric Baseline 0bd31dc382bd This PR 41f7ac0adeac Delta
total 187.15s 68.98s -63.1%
generate_crate_metadata 84.53s 24.49s -71.0%
MIR_borrow_checking 84.13s 24.58s -70.8%
monomorphization_collector_graph_walk 79.74s 22.19s -72.2%
evaluate_obligation self-time 180.62s 46.91s -74.0%

Important caveat: -Z time-passes timings are nested, so generate_crate_metadata and monomorphization_collector_graph_walk are mostly overlapping, not additive.

Why this PR over #16627

#16627 already proved that the ToolHandler stack was the right hotspot, but it got there by making ToolHandler object-safe and changing every handler to return BoxFuture<Result<AnyToolResult, _>> directly.

This PR keeps the lower-churn shape:

  • ToolHandler remains generic over type Output.
  • Concrete handlers use native RPITIT futures with explicit Send bounds.
  • AnyToolHandler remains the only object-safe adapter and still does the boxing at the registry boundary, as before.
  • The implementation diff is only 33 files, +28/-77.

The measurements are at least comparable, and in this run this PR is slightly faster than #16627 on the pass-level total:

Metric #16627 This PR Delta
total 79.90s 68.98s -13.7%
generate_crate_metadata 25.88s 24.49s -5.4%
monomorphization_collector_graph_walk 23.54s 22.19s -5.7%
evaluate_obligation self-time 43.29s 46.91s +8.4%

Profile data

Crate-level timings

cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json after cargo clean -p codex-core.

Baseline data below is reused from the shared parent 0bd31dc382bd profile because this PR and #16627 are both one commit on top of that same parent.

Crate Baseline duration This PR duration Delta Baseline rmeta_time This PR rmeta_time Delta
codex_core 187.380776583s 69.171113833s -63.1% 174.474507208s 55.873015583s -68.0%
starlark 17.90s 16.773824125s -6.3% n/a 8.8999965s n/a

Pass-level timings

cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json after cargo clean -p codex-core.

Pass Baseline This PR Delta
total 187.150662083s 68.978770375s -63.1%
generate_crate_metadata 84.531864625s 24.487462958s -71.0%
MIR_borrow_checking 84.131389375s 24.575553875s -70.8%
monomorphization_collector_graph_walk 79.737515042s 22.190207417s -72.2%
codegen_crate 12.362532292s 12.695237625s +2.7%
type_check_crate 4.4765405s 5.442019542s +21.6%
coherence_checking 3.311121208s 4.239935292s +28.0%
process real / user / sys 187.70s / 201.87s / 4.99s 69.52s / 85.90s / 2.92s n/a

Self-profile query summary

cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes after cargo clean -p codex-core, summarized with measureme summarize -p 0.5.

Query / phase Baseline self time This PR self time Delta Baseline total time This PR total time Baseline item count This PR item count Baseline cache hits This PR cache hits
evaluate_obligation 180.62s 46.91s -74.0% 182.08s 48.37s 572,234 388,659 1,130,998 1,058,553
mir_borrowck 1.42s 1.49s +4.9% 93.77s 29.59s n/a 6,184 n/a 15,298
typeck 1.84s 1.87s +1.6% 2.38s 2.44s n/a 9,367 n/a 79,247
LLVM_module_codegen_emit_obj n/a 17.12s n/a 17.01s 17.12s n/a 256 n/a 0
LLVM_passes n/a 13.07s n/a 12.95s 13.07s n/a 1 n/a 0
codegen_module n/a 12.33s n/a 12.22s 13.64s n/a 256 n/a 0
items_of_instance n/a 676.00ms n/a n/a 24.96s n/a 99,990 n/a 0
type_op_prove_predicate n/a 660.79ms n/a n/a 24.78s n/a 78,762 n/a 235,877
Summary Baseline This PR
evaluate_obligation % of total CPU 70.821% 38.880%
self-profile total CPU time 255.042999997s 120.661175956s
process real / user / sys 220.96s / 235.02s / 7.09s 86.35s / 103.66s / 3.54s

Artifact sizes

From the same measureme summarize output:

Artifact Baseline This PR Delta
crate_metadata 26,534,471 bytes 26,545,248 bytes +10,777
dep_graph 253,181,425 bytes 239,240,806 bytes -13,940,619
linked_artifact 565,366,624 bytes 562,673,176 bytes -2,693,448
object_file 513,127,264 bytes 510,464,096 bytes -2,663,168
query_cache 137,440,945 bytes 136,982,566 bytes -458,379
cgu_instructions 3,586,307 bytes 3,575,121 bytes -11,186
codegen_unit_size_estimate 2,084,846 bytes 2,078,773 bytes -6,073
work_product_index 19,565 bytes 19,565 bytes 0

Baseline hotspots before this change

These are the top normalized obligation buckets from the shared baseline profile:

Obligation bucket Samples Duration
outlives:tasks::review::ReviewTask 1,067 6.33s
outlives:tools::handlers::unified_exec::UnifiedExecHandler 896 5.63s
trait:T as tools::registry::ToolHandler 876 5.45s
outlives:tools::handlers::shell::ShellHandler 888 5.37s
outlives:tools::handlers::shell::ShellCommandHandler 870 5.29s
outlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvider 637 3.73s
outlives:tools::handlers::mcp::McpHandler 695 3.61s
outlives:tasks::regular::RegularTask 726 3.57s

Top items_of_instance entries before this change were mostly concrete async handler/task impls:

Instance Duration
tasks::regular::{impl#2}::run 3.79s
tools::handlers::mcp::{impl#0}::handle 3.27s
tools::runtimes::shell::unix_escalation::{impl#2}::determine_action 3.09s
tools::handlers::agent_jobs::{impl#11}::handle 3.07s
tools::handlers::multi_agents::spawn::{impl#1}::handle 2.84s
tasks::review::{impl#4}::run 2.82s
tools::handlers::multi_agents_v2::spawn::{impl#2}::handle 2.80s
tools::handlers::multi_agents::resume_agent::{impl#1}::handle 2.73s
tools::handlers::unified_exec::{impl#2}::handle 2.54s
tasks::compact::{impl#4}::run 2.45s

What changed

Relevant pre-change registry shape: codex-rs/core/src/tools/registry.rs

Current registry shape in this PR: codex-rs/core/src/tools/registry.rs

Tradeoff

This is intentionally less invasive than #16627: it does not move result boxing into every concrete handler and does not change ToolHandler into an object-safe trait.

Instead, it keeps the existing registry-level type-erasure boundary and only removes the macro-generated async wrapper layer from concrete impls. So the runtime boxing story stays basically the same as before, while the compile-time savings are still large.

Verification

Existing verification for this branch still applies:

  • Ran cargo test -p codex-core; this change compiled and the suite reached the known unrelated config::tests::*guardian* failures, with no local diff under codex-rs/core/src/config/.

Profiling commands used for the tables above:

  • cargo clean -p codex-core
  • cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json
  • cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json
  • cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes
  • measureme summarize -p 0.5

@bolinfest bolinfest changed the title core: use native async ToolHandler trait core: cut codex-core compile time 63% with native async ToolHandler Apr 2, 2026
@bolinfest bolinfest requested a review from aibrahim-oai April 2, 2026 22:49
@bolinfest bolinfest merged commit 3c7f013 into main Apr 2, 2026
28 of 30 checks passed
@bolinfest bolinfest deleted the pr16630 branch April 2, 2026 23:03
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 2, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants