Skip to content

Dispatcher: Reduce operations by moving things into a single Arc#3318

Merged
badboy merged 1 commit intomainfrom
push-nzwzvykykuny
Nov 24, 2025
Merged

Dispatcher: Reduce operations by moving things into a single Arc#3318
badboy merged 1 commit intomainfrom
push-nzwzvykykuny

Conversation

@badboy
Copy link
Copy Markdown
Member

@badboy badboy commented Nov 11, 2025

Before dropping a DispatchGuard required invoking drop on 2 Arcs (reducing their reference count) and 3 Sender.

That happens for every single launch on the global dispatcher (because that clones a new guard).
That's a bit overkill.
Sender is already Sync (when its T is Send), that means it's enough if we keep one of them around (behind the outer Arc, our T is the launched function with an explicit Send bound). Before this the drop of DispatchGuard shows up in profiles for a synthetic benchmark (7.5% for that drop, with 3.4% attributed to dropping Sender).
With this patch that overhead seems to be gone.


I'm working on reliable numbers to back this up, but I wanted to already push it up and get CI going.

@badboy
Copy link
Copy Markdown
Member Author

badboy commented Nov 12, 2025

Latest numbers on this using the benchmark from #3320:

empty fn                time:   [57.513 ns 57.560 ns 57.601 ns]
                        change: [−15.350% −15.272% −15.200%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

fn 1024                 time:   [416.38 ns 416.73 ns 417.07 ns]
                        change: [−15.297% −15.191% −15.066%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high severe

counter.add             time:   [69.467 ns 69.625 ns 69.810 ns]
                        change: [−25.869% −25.701% −25.540%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  8 (8.00%) high severe

@badboy
Copy link
Copy Markdown
Member Author

badboy commented Nov 19, 2025

A final run of the benchmark:

empty fn                time:   [59.656 ns 59.682 ns 59.705 ns]
                        change: [−17.729% −17.650% −17.592%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low severe
  4 (4.00%) low mild

fn 1024                 time:   [448.45 ns 448.84 ns 449.27 ns]
                        change: [−10.590% −10.458% −10.317%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  14 (14.00%) high mild
  2 (2.00%) high severe

counter.add             time:   [70.480 ns 70.583 ns 70.756 ns]
                        change: [−26.847% −26.692% −26.492%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  1 (1.00%) low severe
  5 (5.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe

Note that the improvements vary.
I have even seen fn 1024 to show a regression. But the others always showed some improvement.
So for now I'd like to merge this change still.

@badboy badboy marked this pull request as ready for review November 21, 2025 11:08
@badboy badboy requested a review from a team as a code owner November 21, 2025 11:09
@badboy badboy requested review from chutten and removed request for a team November 21, 2025 11:09
@badboy
Copy link
Copy Markdown
Member Author

badboy commented Nov 21, 2025

oops, forgot to put it up for review

@badboy
Copy link
Copy Markdown
Member Author

badboy commented Nov 21, 2025

ok, now with some reconfigurations of my machines to make behavior more stable, I get reliable numbers, improvements across the board.
Used https://gist.github.com/VorpalBlade/3ab71efbe8de8394217865cd4256c6f8/, so no CPU boost, no power-saving, etc.

Copy link
Copy Markdown
Contributor

@chutten chutten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably doesn't need a CHANGELOG, but maybe we put one in just so we can see it in release notes?

@badboy badboy enabled auto-merge (rebase) November 24, 2025 10:25
Before dropping a `DispatchGuard` required invoking drop on 2 `Arc`s
(reducing their reference count) and 3 `Sender`.

That happens for every single `launch` on the global dispatcher (because
that clones a new guard).
That's a bit overkill.
`Sender` is already `Sync` (when its `T` is `Send`), that means it's
enough if we keep one of them around (behind the outer `Arc`, our `T` is
the launched function with an explicit `Send` bound).
Before this the drop of `DispatchGuard` shows up in profiles for a
synthetic benchmark (7.5% for that drop, with 3.4% attributed to
dropping `Sender`).
With this patch that overhead seems to be gone.
@badboy badboy merged commit 88e30a2 into main Nov 24, 2025
27 of 28 checks passed
@badboy badboy deleted the push-nzwzvykykuny branch November 24, 2025 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants