Skip to content

std.sort.equalRange: improve performance by using center-left upperBound#21278

Closed
Olvilock wants to merge 0 commit intoziglang:masterfrom
Olvilock:master
Closed

std.sort.equalRange: improve performance by using center-left upperBound#21278
Olvilock wants to merge 0 commit intoziglang:masterfrom
Olvilock:master

Conversation

@Olvilock
Copy link

@Olvilock Olvilock commented Sep 1, 2024

Ports solution presented at CppCon by Andrei Alexandrescu.

The upper bound uses result of the lower bound, and also skews the first few choppings to the left, leading up to 1.5x speedup (as reported by Andrei)

@LucasSantos91
Copy link
Contributor

Andrei's algorithm assumes that the result will most frequently be found in the lower indices. I watched the talk a long time ago, and, from what I remember, Andrei claimed that this is the most common case, but he never backed it up with evidence. In my project, I use binary search in two places and, in both cases, that assumption does not hold, and this change would therefore decrease performance. I think it would be better to leave binary search as is, and create a new function, called, for instance, leftBiasedBinarySearch, that implements this algorithm.

@rohlem
Copy link
Contributor

rohlem commented Sep 2, 2024

@LucasSantos91 From how I understood the talk, the assumption for the portion optimizing general binary search is that the input value can lie outside of the range of contained elements (also referred to as an "open range search" iiuc, otherwise it would be referred to as a "closed range search").
The resulting algorithm was to compare the center element first, which gives information on whether the element to search for is in the lower or higher half, then skew the binary search towards the end/extreme until a more extreme element is found to close the search range, after which it falls back to classic unskewed binary search, since this is more optimal for the remaining closed range search.
I currently can't connect your interpretation / statement about lower indices with anything I remember from the talk, feel free to clarify (maybe with a timestamp).

Regardless, this PR only seems to port the equalRange improvement, which is compared under the same closed-range assumption that is already the case with the previous implementation. The normal binarySearch function is unmodified.
The algorithm starts with normal, non-skewed binarySearch, and only skews the relative upper bound to the (center) left, because it's generally (statistically) less likely for the same element to repeat for a majority of the rest of the sequence. Once an element inside the range is found, it reverts to non-skewed binarySearch for the remaining upper bound within the closed range.

Some sort of performance benchmarking code and results as motivation for whether to merge this would be nice.

@LucasSantos91
Copy link
Contributor

Sorry, my bad. I was thinking of binary search instead of equal range.
For the equal range, I think the biggest performance gain comes from doing the lower bound simultaneously with the upper bound. This way, you only need to iterate over the range once, as each step gives you information that can be used to tighten one end of the resulting range.

@Olvilock
Copy link
Author

Olvilock commented Sep 3, 2024

I've written a little benchmark to assess which approach would be best in the case of std.sort.equalRange. I compare the current implementation with this proposal and 21290. To reproduce the results, curl this words.txt file.
My results (using the latest zig master):
$ zig build-exe bench.zig -O ReleaseSafe && ./bench
Vanilla function: 562 ms
Center-left function: 374 ms
Simul function: 271 ms
Totals: { 4665490000, 0, 4665490000 }

$ zig build-exe bench.zig -O ReleaseFast && ./bench
Vanilla function: 380 ms
Center-left function: 240 ms
Simul function: 252 ms
Totals: { 4665490000, 0, 4665490000 }

It seems that there's a bug somewhere in my version, and it is also slower than 21290, so this pull definitely needs holding off. Not closing yet because 21290 needs two comparisons to work during the common iterations, and therefore can be slower than lowerBound + adjusted upperBound for trivial types.

@Olvilock Olvilock closed this Sep 5, 2024
@Olvilock
Copy link
Author

Olvilock commented Sep 5, 2024

It seems to me that under the current API, this implementation is bound to underperform in ReleaseSafe, because the unreachable is not able to optimize out the redundant checks. Therefore I do not see this as a better implementation unless the API gets reverted to 0.13. I will reimplement this pull under 0.13 API and reopen later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants