Don't store hashes in GroupOrdering#7029
Merged
tustvold merged 3 commits intoapache:mainfrom Jul 19, 2023
Merged
Conversation
alamb
approved these changes
Jul 19, 2023
Contributor
alamb
left a comment
There was a problem hiding this comment.
Looks great to me
FYI @mustafasrepo and @ozankabak -- this effectively should improve the speed of streamed / bounded group by
| for (idx, &hash) in hashes.iter().enumerate() { | ||
| self.map.insert(hash, (hash, idx), |(hash, _)| *hash); | ||
| self.group_ordering.remove_groups(n); | ||
| // SAFETY: self.map outlives iterator and is not modified concurrently |
Contributor
There was a problem hiding this comment.
I double checked: https://docs.rs/hashbrown/latest/hashbrown/raw/struct.RawTable.html#method.iter 👍
Comment on lines
628
to
634
| unsafe { | ||
| for bucket in self.map.iter() { | ||
| match bucket.as_ref().1.checked_sub(n) { | ||
| None => self.map.erase(bucket), | ||
| Some(sub) => bucket.as_mut().1 = sub, | ||
| } | ||
| } |
Contributor
There was a problem hiding this comment.
I think this is both wonderfully elegant as well as cryptic. How about some comments (this is so I don't have to refigure this out the next time I see this code):
Suggested change
| unsafe { | |
| for bucket in self.map.iter() { | |
| match bucket.as_ref().1.checked_sub(n) { | |
| None => self.map.erase(bucket), | |
| Some(sub) => bucket.as_mut().1 = sub, | |
| } | |
| } | |
| unsafe { | |
| for bucket in self.map.iter() { | |
| // decrement group index by n | |
| match bucket.as_ref().1.checked_sub(n) { | |
| // group index was < n, so remove from table | |
| None => self.map.erase(bucket), | |
| // group index was >= n, shift value down | |
| Some(sub) => bucket.as_mut().1 = sub, | |
| } | |
| } |
I double checked https://docs.rs/hashbrown/latest/hashbrown/raw/struct.RawIter.html
You must not free the hash table while iterating (including via growing/shrinking).
It is fine to erase a bucket that has been yielded by the iterator.
Erasing a bucket that has not yet been yielded by the iterator may still result in the iterator yielding that bucket (unless reflect_remove is called).
It is unspecified whether an element inserted after the iterator was created will be yielded by that iterator (unless reflect_insert is called).
The order in which the iterator yields bucket is unspecified and may change in the future.
Which seems to be followed 👍
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #.
Rationale for this change
The approach of storing hashes in GroupOrdering was causing merge conflicts for #7016 and is not actually necessary
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?