-
Notifications
You must be signed in to change notification settings - Fork 19
chore(crashtracker): use weaker mem ordering for OP_COUNTERS #1744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
gh-worker-dd-mergequeue-cf854d
merged 1 commit into
main
from
yannham/mem-ordering-op-counters
Mar 18, 2026
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is a bug (even before these changes.)
For
begin_opFor
end_opIf old == 0, we return an error, but the counter has already been decremented to -1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But since we use these as diagnostic, and not critical for synchronization I think its not life-threatening?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you're onto something. The first one is a classical problem with counters; IIRC there's specific handling in e.g. the implementation
Arc(even if you do it right there could theoretically be concurrentfetch_adds between yourfetch_addand the test, which would overflow). One possibility is to keep a "buffer zone": instead of checking for overflow tightly, you could for example test forold > i64::MAX / 2, and maybe reset tooldupon overflow (it's almost impossible in practice that there arei64::MAX / 2concurrent increments before the test). A clean fix requires an initialloadand a compare-exchange, I fear, which is more costly for 99.99% of the code paths where you don't actually overflow.I'm not sure the second is an issue here though, since the atomic is
i64and the check isold <= 0, it's probably ok for the counter to go far in the negative values - it's considered the same as being 0? Except maybe upon reading/reporting? Though, similarly, you could fix it with a more complex read-then-compare-exchange loop.By the way, what do you think is a reasonable range for those counters in practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah,
Arcis basically pulling theMAX / 2trick as well: https://doc.rust-lang.org/src/alloc/sync.rs.html#2390-2407There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, its a non issue for crashtracking. Each op will ever only be 0 or 1 technically. I was just thinking about the underlying logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I guess the upper bound is the # of threads doing the same op at the same time. Mostly profiling operations.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Then indeed overflow is very theoretical, in fact straight up impossible, given each thread takes up some space for its stack, making the max number of live threads at any point in time quite smaller than
i64::MAX. But yeah in general I would say that keeping a safe range of (i64::MAXnegative values for underflow, andi64::MAX / 2of upper values for overflow) is practical way to do it without hurting the happy path. The right way ™️ would be to load first, and only update (with a compare-exchange) if it's indeed not overflowing/underflowing the counter, but this is quite more expensive.