-
-
Notifications
You must be signed in to change notification settings - Fork 14.2k
Optimized HashMap. RawTable exposes a safe interface. #15720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function looks pretty similar to calculate_offsets. Can you try to share as much code as possible? It was pretty subtle for me to get right in the first place, and bugs in this type of code can be pretty catastrophic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to reuse the code or write a comprehensive test, at least.
calculate_offsets returns more than we need here. I found it difficult to control its inlining. This function is called millions of times from rustc. It's an important part of every search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method has mut in the name, but takes &self: would taking &mut self be more appropriate? (Is that even possible?)
|
Had to write a branchless |
|
Are you sure that that is from this patch? |
|
That's right, let's compare two stage1s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"... and it's zero at all other times."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
This is nearly ready for a merge. I'm going check a few details and document them.
|
|
|
|
The build passes once again. Everything is resolved so far and ready for a merge. I hope I didn't accidentally remove anything during rebase. |
|
I was running some JSON tests today and got these results: So it's an improvement but it still doesn't match the JSON decoder using TreeMap. I don't have the expertise to check the reason, but someone more experienced may find it worth a look. Code JSON used: |
|
@arthurprs Benchmarking insertion is not reliable since it depends on the allocator. Does serialization create an empty map for every structure? The difference would be best measured with a profiler. |
|
I tried to, but since I have little experience with rust internals and C-ish profilers I couldn't identify the reason. The object building code is bellow, it's pretty simple. The only changes I made was replacing TreeMap with HashMap. |
* branchless `bucket.next()` * updated documentation after interface changes
|
Closing due to inactivity, but feel free to reopen with a rebase! |
|
Argh! @pczarn I completely forgot about this pull request -- if you do rebase, please ping me and I will try to review promptly. |
This is #15720, rebased and reopened. cc @nikomatsakis
… Default` (rust-lang#15720) Fixes rust-lang/rust-clippy#14552 changelog: [`new_without_default`]: if `new` has `#[cfg]`, copy that onto `impl Default`
First, a benchmark of the original hashmap implementation on Intel i3.
bucket_distanceandpop_internalA branchless implementation of probe count calculation. A branch just avoided unsigned underflow. Since the
capacityis a power of 2, we can ignore underflow and simply return the difference modulo capacity. We can use the fact that 'an index argument >= capacity' is acceptable later on.pop_internalreturnsVinstead ofOption<V>.No more low hanging fruits...
iteration over buckets
Let's use external iterators for their greatest advantage in Rust: to avoid bounds checks and indexing. Thus
{Empty,Full}Indexbecomes{Empty,Full}Bucket.The growing is done an optimized reinsertion algorithm based on
insert_hashed_ordered.HashMap::new()returns a table with the capacity set to 0. Hashmaps that never had an element inserted won't allocate, as measured bybench::new_drop.Removed two pointers from
RawTableas they can be recalculated once per iteration rather than every indexing operation.RawTablestruct is as small asVecwith 24 bytes.Reduced code duplication between
swapandinsert_hashed_nocheck. Created a new methodinsert_or_replace_withthat accepts a simple closure for this. Moreover,robin_hoodwill most likely get inlined.safe interface
This is possible thanks to a lot of prior work by @nikomatsakis! Relevant commit: nikomatsakis@2fcb95b
To eliminate the double-take issue, we must tie a bucket pointer to a reference to the table within a private structure. Only a bucket that holds a unique reference can be updated.
GapThenFulluses a similar strategy to encapsulate two consecutive buckets and a single reference. Some of HashMap's methods became functions parameterized over mutability.We could use an uninit reference or drop trickery to replace one zeroing of a bucket's hash per
shiftofGapThenFullwith a single memory access. The former language feature is not planned and the latter seems inefficient.split hashmap.rs into separate files
Total number of lines approaches 3000.
final refactoring
Standalone
robin_hood.I realized that
insert_or_update_withcan callinsert_or_replace_withdirectly.Inlining now started happening in microbenchmarks for some reason. Looks like
search_hashed_genericis quite small:conclusion
Further optimizations are increasingly difficult, usually they would make the code unreadable or would require new language features.
It's possible to build cache aware HashMap on top of a vector of SoAs, such as
Vec<([u64, ..N], [K, ..N], [V, ..N])>. Possibility of in-place growing is the only performance advantage of this approach. Unfortunately, an increase in iteration complexity surpasses benefits of storage reuse.Documentation is not finished. Most methods need more comments and tests.
Robin hood hashing scheme was introduced in #12081.
cc @cgaebel