-
Notifications
You must be signed in to change notification settings - Fork 34
Improve alphabetical ordering for non-version entries #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I understand why (classic alphabetical comparison of numeric portions without consideration of place values) but I'll have to think through supporting this use case more. 🤔 |
|
OK, took a slightly different approach. The tests are all passing again! 🎉 👍🏻 |
chesterbr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My C is rusty, but it makes sense to me.
mislav
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! Since this code is being used for important functionality all throughout GitHub, as you've pointed out, I'm going to comment from a very conservative standpoint and request that the change be as narrowly scoped as possible.
First of all, while your PR description is great, I feel like it doesn't define in very precise terms what the primary desired outcome of this PR is. Was it to ensure that emmaviolet-patch gets sorted before emmaviolet-patch-1? If that was so, how come this case isn't included in the new tests?
| "", " ", '!@#$%^&*()', "-", ".", "<--------->", | ||
| "The Quick Brown Fox", "a12a8a4a22122d01541b62193e9bdad7f5eda552", | ||
| "ćevapčići", "1." * 65 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that sorting is wildly different here after this change worries me a lot. Was the goal of the PR to refine how these are sorted? If not, would it be possible to only scope the changes in this PR to achieve the desired sorting of specific cases but leave all other behaviors intact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the trouble of the current sorting mechanism for non-versions can be summed up as:
- It will [basically] sort as expected for alphanumeric characters
- But all other characters (or lack thereof), they are effectively ignored, and the output order will be more based on input order
This is why I added the strcmp usage specifically for comparisons with 0 comparison chunks, which will make sorting for those special cases deterministic.
Here's a second PR with a new test (which will fail) demonstrating the issue: #20
P.S. To be clear: there is still opportunity for other special cases that mix alphanumerics and non-alphanumerics to end up sorted non-deterministically. 😬
mislav
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the updates. I've also run the benchmark and I also don't see a significant performance hit, so I would say this is good to go from the performance perspective.
Sorting a randomized list of 451 rails/rails versions:
.!.!..!.!.!...............................
VersionSorter .sort
Baseline: | X---- |
Current: | X-----------|
0 225.250 us
VersionSorter .rsort
Baseline: | X------ |
Current: | X-----------|
0 224.625 us
Sorting 16900 branch names:
................................
VersionSorter .sort
Baseline: | X--|
Current: | X---- |
0 11.302 ms
VersionSorter .rsort
Baseline: | X----|
Current: | X----- |
0 11.650 ms
The improvement to sorting in this PR should affect only values that don't look like tag names, so I think tag sorting should stay the same. 👍
This
version_sorterwas designed for speedy sorting by writing it in C code instead of Ruby. It was also designed primarily for sorting tags (i.e. SemVer release version numbers).Alas, for one reason or another, it is also used at GitHub as the primary sorting mechanism for our branch lists as well. 🤔
As branch names rarely follow SemVer release version numbering patterns, this has led to the occasional friction-inducing edge cases, such as filtering for branches yielding a longer name before the exact matching name:
While it is possible to workaround this scenario with additional sorting code on the Ruby side, that would also reduce the speedy sorting that we so wanted. 😊
As such, I have introduced a small, tactical change to
short-circuit the tag-focused sorting mechanism (in favor of good olebundle as many hyphenated string chunks together as possible into a single comparablestrcmpfor alphabetical sorting) if the input doesn't resemble a SemVer release version numberstrchunk-- plus a tiny special case for entries with zero recognizable chunks to sort alphabetically withstrcmp.Also: before doing so, I added a handful of new tests to both ensure that the previous critical path wouldn't end up broken, as well as to achieve verification of the newly implemented requirements.
P.S. I ran the benchmarking suite before and after my change. Although it did appear to add some microseconds (but less than 1 millisecond) -- if I'm interpreting the output correctly --, I also saw similar increases when just re-running the suite more than once without my changes applied at all. As such, it feels like it's probably a negligible hit to performance...? 🤷🏻♂️