perf: various optimizations to eliminate branch misprediction in hash_utils#20168
perf: various optimizations to eliminate branch misprediction in hash_utils#20168adriangb merged 10 commits intoapache:mainfrom
Conversation
|
it would be great to first merge a benchmark into |
|
hey @adriangb, the benchmarks to test changes already exist in
here are some numbers: But I'm curious do we also want to benchmark cases where the values contain nulls? let me know what you think |
|
run benchmark with_hashes |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Hey @adriangb / @Dandandan , I have reaised a PR #20182 that adds a couple more benchmark tests that check performance for and they look like: |
|
Hey @Jefffrey, this is ready for review! Would appreciate your feedback whenever you get time! Thanks benchmark results (locally tested) Details |
|
@adriangb done! thank you! |
|
run benchmark with_hashes |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
I’ll leave the review up to @Jefffrey but from my perspective benchmarks look good and cursory glance at change looks well structured. This is worth reviewing and moving forward. |
Jefffrey
left a comment
There was a problem hiding this comment.
These changes look good with some nice benchmark improvements 👍
I've changed the PR body to not close the original issue however, as there are still some points from it to address in follow ups
Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
…_utils (apache#20168) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Part of apache#20152 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Compile time monomorphization helps bring `rehash` outside the hot loop where it's not required. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Currently the PR adds a specialized `hash_dictionary_inner()` function with const generic parameters that check for nulls in keys, values. It also handles specific edge cases of just nulls in keys or values. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> There are no additional tests yet. But I will add 'em as I continue. The benchmark results seem promising. here's `cargo bench --bench with_hashes -- dictionary` for <details> <summary>origin/main</summary> ``` Gnuplot not found, using plotters backend Benchmarking dictionary_utf8_int32: single, no nulls Benchmarking dictionary_utf8_int32: single, no nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: single, no nulls: Collecting 100 samples in estimated 5.0461 s (470k iterations) Benchmarking dictionary_utf8_int32: single, no nulls: Analyzing dictionary_utf8_int32: single, no nulls time: [10.668 µs 10.700 µs 10.734 µs] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild Benchmarking dictionary_utf8_int32: single, nulls Benchmarking dictionary_utf8_int32: single, nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: single, nulls: Collecting 100 samples in estimated 5.0428 s (409k iterations) Benchmarking dictionary_utf8_int32: single, nulls: Analyzing dictionary_utf8_int32: single, nulls time: [12.269 µs 12.293 µs 12.322 µs] Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild Benchmarking dictionary_utf8_int32: multiple, no nulls Benchmarking dictionary_utf8_int32: multiple, no nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: multiple, no nulls: Collecting 100 samples in estimated 5.0864 s (162k iterations) Benchmarking dictionary_utf8_int32: multiple, no nulls: Analyzing dictionary_utf8_int32: multiple, no nulls time: [31.357 µs 31.426 µs 31.506 µs] Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low mild 5 (5.00%) high mild 1 (1.00%) high severe Benchmarking dictionary_utf8_int32: multiple, nulls Benchmarking dictionary_utf8_int32: multiple, nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: multiple, nulls: Collecting 100 samples in estimated 5.0842 s (141k iterations) Benchmarking dictionary_utf8_int32: multiple, nulls: Analyzing dictionary_utf8_int32: multiple, nulls time: [36.060 µs 36.135 µs 36.220 µs] Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild 5 (5.00%) high severe ``` </details> <details> <summary>feat/brunch-prediction</summary> ``` Gnuplot not found, using plotters backend Benchmarking dictionary_utf8_int32: single, no nulls Benchmarking dictionary_utf8_int32: single, no nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: single, no nulls: Collecting 100 samples in estimated 5.0176 s (1.1M iterations) Benchmarking dictionary_utf8_int32: single, no nulls: Analyzing dictionary_utf8_int32: single, no nulls time: [4.7186 µs 4.7496 µs 4.7821 µs] change: [−55.829% −55.537% −55.240%] (p = 0.00 < 0.05) Performance has improved. Benchmarking dictionary_utf8_int32: single, nulls Benchmarking dictionary_utf8_int32: single, nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: single, nulls: Collecting 100 samples in estimated 5.0295 s (712k iterations) Benchmarking dictionary_utf8_int32: single, nulls: Analyzing dictionary_utf8_int32: single, nulls time: [6.9647 µs 7.0426 µs 7.1281 µs] change: [−43.806% −43.445% −42.993%] (p = 0.00 < 0.05) Performance has improved. Found 16 outliers among 100 measurements (16.00%) 1 (1.00%) low severe 4 (4.00%) low mild 1 (1.00%) high mild 10 (10.00%) high severe Benchmarking dictionary_utf8_int32: multiple, no nulls Benchmarking dictionary_utf8_int32: multiple, no nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: multiple, no nulls: Collecting 100 samples in estimated 5.0600 s (348k iterations) Benchmarking dictionary_utf8_int32: multiple, no nulls: Analyzing dictionary_utf8_int32: multiple, no nulls time: [13.365 µs 13.384 µs 13.404 µs] change: [−57.610% −57.464% −57.313%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) low severe 4 (4.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe Benchmarking dictionary_utf8_int32: multiple, nulls Benchmarking dictionary_utf8_int32: multiple, nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: multiple, nulls: Collecting 100 samples in estimated 5.0569 s (242k iterations) Benchmarking dictionary_utf8_int32: multiple, nulls: Analyzing dictionary_utf8_int32: multiple, nulls time: [20.785 µs 20.962 µs 21.173 µs] change: [−42.370% −42.001% −41.579%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 1 (1.00%) low severe 3 (3.00%) high mild 14 (14.00%) high severe ``` </details> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Daniël Heres <danielheres@gmail.com> Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
Which issue does this PR close?
Rationale for this change
Compile time monomorphization helps bring
rehashoutside the hot loop where it's not required.What changes are included in this PR?
Currently the PR adds a specialized
hash_dictionary_inner()function with const generic parameters that check for nulls in keys, values. It also handles specific edge cases of just nulls in keys or values.Are these changes tested?
There are no additional tests yet. But I will add 'em as I continue. The benchmark results seem promising.
here's
cargo bench --bench with_hashes -- dictionaryfororigin/main
feat/brunch-prediction
Are there any user-facing changes?