Skip to content

Conversation

@JamesMGreene
Copy link
Member

@JamesMGreene JamesMGreene commented Oct 16, 2021

This version_sorter was designed for speedy sorting by writing it in C code instead of Ruby. It was also designed primarily for sorting tags (i.e. SemVer release version numbers).

Alas, for one reason or another, it is also used at GitHub as the primary sorting mechanism for our branch lists as well. 🤔

As branch names rarely follow SemVer release version numbering patterns, this has led to the occasional friction-inducing edge cases, such as filtering for branches yielding a longer name before the exact matching name:

screenshot

While it is possible to workaround this scenario with additional sorting code on the Ruby side, that would also reduce the speedy sorting that we so wanted. 😊

As such, I have introduced a small, tactical change to short-circuit the tag-focused sorting mechanism (in favor of good ole strcmp for alphabetical sorting) if the input doesn't resemble a SemVer release version number bundle as many hyphenated string chunks together as possible into a single comparable strchunk -- plus a tiny special case for entries with zero recognizable chunks to sort alphabetically with strcmp.

Also: before doing so, I added a handful of new tests to both ensure that the previous critical path wouldn't end up broken, as well as to achieve verification of the newly implemented requirements.

P.S. I ran the benchmarking suite before and after my change. Although it did appear to add some microseconds (but less than 1 millisecond) -- if I'm interpreting the output correctly --, I also saw similar increases when just re-running the suite more than once without my changes applied at all. As such, it feels like it's probably a negligible hit to performance...? 🤷🏻‍♂️

$ bin/readygo --compare test/benchmark.rb
.!..!...............................

VersionSorter .sort
  Baseline: |                                                       X------    |
  Current:  |                                                   X--------------|
            0                                                         913.500 us

VersionSorter .rsort
  Baseline: |                                                            X-----|
  Current:  |                                                          X--     |
            0                                                         846.500 us

@JamesMGreene
Copy link
Member Author

JamesMGreene commented Oct 16, 2021

⚠️ Darn, it looks like one of my newly added tests (test_yui_style_tags) to verify consistent behavior with the old vs. new code paths is actually failing now. 😕

I understand why (classic alphabetical comparison of numeric portions without consideration of place values) but I'll have to think through supporting this use case more. 🤔

@JamesMGreene JamesMGreene marked this pull request as draft October 16, 2021 02:15
@JamesMGreene JamesMGreene requested a review from mislav October 16, 2021 02:49
@JamesMGreene JamesMGreene self-assigned this Oct 16, 2021
@JamesMGreene JamesMGreene marked this pull request as ready for review October 16, 2021 02:50
@JamesMGreene
Copy link
Member Author

OK, took a slightly different approach. The tests are all passing again! 🎉 👍🏻

Copy link

@chesterbr chesterbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My C is rusty, but it makes sense to me.

Copy link
Contributor

@mislav mislav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Since this code is being used for important functionality all throughout GitHub, as you've pointed out, I'm going to comment from a very conservative standpoint and request that the change be as narrowly scoped as possible.

First of all, while your PR description is great, I feel like it doesn't define in very precise terms what the primary desired outcome of this PR is. Was it to ensure that emmaviolet-patch gets sorted before emmaviolet-patch-1? If that was so, how come this case isn't included in the new tests?

Comment on lines +80 to +82
"", " ", '!@#$%^&*()', "-", ".", "<--------->",
"The Quick Brown Fox", "a12a8a4a22122d01541b62193e9bdad7f5eda552",
"ćevapčići", "1." * 65
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that sorting is wildly different here after this change worries me a lot. Was the goal of the PR to refine how these are sorted? If not, would it be possible to only scope the changes in this PR to achieve the desired sorting of specific cases but leave all other behaviors intact?

Copy link
Member Author

@JamesMGreene JamesMGreene Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the trouble of the current sorting mechanism for non-versions can be summed up as:

  • It will [basically] sort as expected for alphanumeric characters
  • But all other characters (or lack thereof), they are effectively ignored, and the output order will be more based on input order

This is why I added the strcmp usage specifically for comparisons with 0 comparison chunks, which will make sorting for those special cases deterministic.

Here's a second PR with a new test (which will fail) demonstrating the issue: #20

P.S. To be clear: there is still opportunity for other special cases that mix alphanumerics and non-alphanumerics to end up sorted non-deterministically. 😬

Copy link
Contributor

@mislav mislav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates. I've also run the benchmark and I also don't see a significant performance hit, so I would say this is good to go from the performance perspective.

Sorting a randomized list of 451 rails/rails versions:

.!.!..!.!.!...............................

VersionSorter .sort
  Baseline: |                                                       X----      |
  Current:  |                                                      X-----------|
            0                                                         225.250 us

VersionSorter .rsort
  Baseline: |                                                        X------   |
  Current:  |                                                      X-----------|
            0                                                         224.625 us

Sorting 16900 branch names:

................................

VersionSorter .sort
  Baseline: |                                                               X--|
  Current:  |                                                      X----       |
            0                                                          11.302 ms

VersionSorter .rsort
  Baseline: |                                                             X----|
  Current:  |                                                     X-----       |
            0                                                          11.650 ms

The improvement to sorting in this PR should affect only values that don't look like tag names, so I think tag sorting should stay the same. 👍

@JamesMGreene JamesMGreene merged commit ec3d65a into github:master Oct 20, 2021
@JamesMGreene JamesMGreene deleted the fix-alpha-ordering branch October 20, 2021 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants