std.sort.equalRange: improve performance by using center-left upperBound by Olvilock · Pull Request #21278 · ziglang/zig

Olvilock · 2024-09-01T20:44:02Z

Ports solution presented at CppCon by Andrei Alexandrescu.

The upper bound uses result of the lower bound, and also skews the first few choppings to the left, leading up to 1.5x speedup (as reported by Andrei)

LucasSantos91 · 2024-09-02T15:38:10Z

Andrei's algorithm assumes that the result will most frequently be found in the lower indices. I watched the talk a long time ago, and, from what I remember, Andrei claimed that this is the most common case, but he never backed it up with evidence. In my project, I use binary search in two places and, in both cases, that assumption does not hold, and this change would therefore decrease performance. I think it would be better to leave binary search as is, and create a new function, called, for instance, leftBiasedBinarySearch, that implements this algorithm.

rohlem · 2024-09-02T16:33:08Z

@LucasSantos91 From how I understood the talk, the assumption for the portion optimizing general binary search is that the input value can lie outside of the range of contained elements (also referred to as an "open range search" iiuc, otherwise it would be referred to as a "closed range search").
The resulting algorithm was to compare the center element first, which gives information on whether the element to search for is in the lower or higher half, then skew the binary search towards the end/extreme until a more extreme element is found to close the search range, after which it falls back to classic unskewed binary search, since this is more optimal for the remaining closed range search.
I currently can't connect your interpretation / statement about lower indices with anything I remember from the talk, feel free to clarify (maybe with a timestamp).

Regardless, this PR only seems to port the equalRange improvement, which is compared under the same closed-range assumption that is already the case with the previous implementation. The normal binarySearch function is unmodified.
The algorithm starts with normal, non-skewed binarySearch, and only skews the relative upper bound to the (center) left, because it's generally (statistically) less likely for the same element to repeat for a majority of the rest of the sequence. Once an element inside the range is found, it reverts to non-skewed binarySearch for the remaining upper bound within the closed range.

Some sort of performance benchmarking code and results as motivation for whether to merge this would be nice.

LucasSantos91 · 2024-09-02T18:36:42Z

Sorry, my bad. I was thinking of binary search instead of equal range.
For the equal range, I think the biggest performance gain comes from doing the lower bound simultaneously with the upper bound. This way, you only need to iterate over the range once, as each step gives you information that can be used to tighten one end of the resulting range.

Olvilock · 2024-09-03T18:52:31Z

I've written a little benchmark to assess which approach would be best in the case of std.sort.equalRange. I compare the current implementation with this proposal and 21290. To reproduce the results, curl this words.txt file.
My results (using the latest zig master):
$ zig build-exe bench.zig -O ReleaseSafe && ./bench
Vanilla function: 562 ms
Center-left function: 374 ms
Simul function: 271 ms
Totals: { 4665490000, 0, 4665490000 }

$ zig build-exe bench.zig -O ReleaseFast && ./bench
Vanilla function: 380 ms
Center-left function: 240 ms
Simul function: 252 ms
Totals: { 4665490000, 0, 4665490000 }

It seems that there's a bug somewhere in my version, and it is also slower than 21290, so this pull definitely needs holding off. Not closing yet because 21290 needs two comparisons to work during the common iterations, and therefore can be slower than lowerBound + adjusted upperBound for trivial types.

Olvilock · 2024-09-05T16:40:33Z

It seems to me that under the current API, this implementation is bound to underperform in ReleaseSafe, because the unreachable is not able to optimize out the redundant checks. Therefore I do not see this as a better implementation unless the API gets reverted to 0.13. I will reimplement this pull under 0.13 API and reopen later

LucasSantos91 mentioned this pull request Sep 3, 2024

std.equalRange: Compute lower and upper bounds simultaneously. #21290

Closed

Olvilock closed this Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

std.sort.equalRange: improve performance by using center-left upperBound#21278

std.sort.equalRange: improve performance by using center-left upperBound#21278
Olvilock wants to merge 0 commit intoziglang:masterfrom
Olvilock:master

Olvilock commented Sep 1, 2024

Uh oh!

LucasSantos91 commented Sep 2, 2024

Uh oh!

rohlem commented Sep 2, 2024 •

edited

Loading

Uh oh!

LucasSantos91 commented Sep 2, 2024

Uh oh!

Olvilock commented Sep 3, 2024 •

edited

Loading

Uh oh!

Olvilock commented Sep 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Olvilock commented Sep 1, 2024

Uh oh!

LucasSantos91 commented Sep 2, 2024

Uh oh!

rohlem commented Sep 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LucasSantos91 commented Sep 2, 2024

Uh oh!

Olvilock commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Olvilock commented Sep 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rohlem commented Sep 2, 2024 •

edited

Loading

Olvilock commented Sep 3, 2024 •

edited

Loading