Improve alphabetical ordering for non-version entries #19

JamesMGreene · 2021-10-16T01:42:22Z

This version_sorter was designed for speedy sorting by writing it in C code instead of Ruby. It was also designed primarily for sorting tags (i.e. SemVer release version numbers).

Alas, for one reason or another, it is also used at GitHub as the primary sorting mechanism for our branch lists as well. 🤔

As branch names rarely follow SemVer release version numbering patterns, this has led to the occasional friction-inducing edge cases, such as filtering for branches yielding a longer name before the exact matching name:

While it is possible to workaround this scenario with additional sorting code on the Ruby side, that would also reduce the speedy sorting that we so wanted. 😊

As such, I have introduced a small, tactical change to ~~short-circuit the tag-focused sorting mechanism (in favor of good ole strcmp for alphabetical sorting) if the input doesn't resemble a SemVer release version number~~ bundle as many hyphenated string chunks together as possible into a single comparable strchunk -- plus a tiny special case for entries with zero recognizable chunks to sort alphabetically with strcmp.

Also: before doing so, I added a handful of new tests to both ensure that the previous critical path wouldn't end up broken, as well as to achieve verification of the newly implemented requirements.

P.S. I ran the benchmarking suite before and after my change. Although it did appear to add some microseconds (but less than 1 millisecond) -- if I'm interpreting the output correctly --, I also saw similar increases when just re-running the suite more than once without my changes applied at all. As such, it feels like it's probably a negligible hit to performance...? 🤷🏻‍♂️

$ bin/readygo --compare test/benchmark.rb
.!..!...............................

VersionSorter .sort
  Baseline: |                                                       X------    |
  Current:  |                                                   X--------------|
            0                                                         913.500 us

VersionSorter .rsort
  Baseline: |                                                            X-----|
  Current:  |                                                          X--     |
            0                                                         846.500 us

…cally

…lined order

JamesMGreene · 2021-10-16T02:15:35Z

⚠️ Darn, it looks like one of my newly added tests (test_yui_style_tags) to verify consistent behavior with the old vs. new code paths is actually failing now. 😕

I understand why (classic alphabetical comparison of numeric portions without consideration of place values) but I'll have to think through supporting this use case more. 🤔

…ions as possible

JamesMGreene · 2021-10-16T02:53:50Z

OK, took a slightly different approach. The tests are all passing again! 🎉 👍🏻

chesterbr

My C is rusty, but it makes sense to me.

mislav

Thanks for the contribution! Since this code is being used for important functionality all throughout GitHub, as you've pointed out, I'm going to comment from a very conservative standpoint and request that the change be as narrowly scoped as possible.

First of all, while your PR description is great, I feel like it doesn't define in very precise terms what the primary desired outcome of this PR is. Was it to ensure that emmaviolet-patch gets sorted before emmaviolet-patch-1? If that was so, how come this case isn't included in the new tests?

test/version_sorter_test.rb

mislav · 2021-10-18T14:18:48Z

test/version_sorter_test.rb

+      "", " ", '!@#$%^&*()', "-", ".", "<--------->",
+      "The Quick Brown Fox", "a12a8a4a22122d01541b62193e9bdad7f5eda552",
+      "ćevapčići", "1." * 65


The fact that sorting is wildly different here after this change worries me a lot. Was the goal of the PR to refine how these are sorted? If not, would it be possible to only scope the changes in this PR to achieve the desired sorting of specific cases but leave all other behaviors intact?

I think the trouble of the current sorting mechanism for non-versions can be summed up as:

It will [basically] sort as expected for alphanumeric characters

But all other characters (or lack thereof), they are effectively ignored, and the output order will be more based on input order

This is why I added the strcmp usage specifically for comparisons with 0 comparison chunks, which will make sorting for those special cases deterministic.

Here's a second PR with a new test (which will fail) demonstrating the issue: #20

P.S. To be clear: there is still opportunity for other special cases that mix alphanumerics and non-alphanumerics to end up sorted non-deterministically. 😬

mislav

Thank you for the updates. I've also run the benchmark and I also don't see a significant performance hit, so I would say this is good to go from the performance perspective.

Sorting a randomized list of 451 rails/rails versions:

.!.!..!.!.!...............................

VersionSorter .sort
  Baseline: |                                                       X----      |
  Current:  |                                                      X-----------|
            0                                                         225.250 us

VersionSorter .rsort
  Baseline: |                                                        X------   |
  Current:  |                                                      X-----------|
            0                                                         224.625 us

Sorting 16900 branch names:

................................

VersionSorter .sort
  Baseline: |                                                               X--|
  Current:  |                                                      X----       |
            0                                                          11.302 ms

VersionSorter .rsort
  Baseline: |                                                             X----|
  Current:  |                                                     X-----       |
            0                                                          11.650 ms

The improvement to sorting in this PR should affect only values that don't look like tag names, so I think tag sorting should stay the same. 👍

JamesMGreene added 7 commits October 15, 2021 16:37

Improve test code consistency

e28beff

Add tests for sorting version numbers with v/V prefixes

cab24dd

Add test for sorting non-version data with trailing numbers alphabeti…

8d253ca

…cally

Update test for sorting non-version data to a more alphabetically-inc…

adba5ad

…lined order

Add test for yui style tags to match benchmark fixtures

6840114

Update parse_version_number to short-circuit on non-versions

27ec38a

Add a test to verify the README example

eaa0275

JamesMGreene marked this pull request as draft October 16, 2021 02:15

JamesMGreene added 4 commits October 15, 2021 21:46

Slight ordering expectation adjustment to tests

718e23b

Handle special case for zero recognized chunks with alphabetical sorting

b5babff

Undo short-circuiting code in parse_version_number

269b7b6

Adjust string chunk building logic to include as many hyphenated port…

b214d73

…ions as possible

JamesMGreene requested a review from mislav October 16, 2021 02:49

JamesMGreene self-assigned this Oct 16, 2021

JamesMGreene marked this pull request as ready for review October 16, 2021 02:50

Reduce size comparisons from 2 to 1

3c8b797

chesterbr approved these changes Oct 16, 2021

View reviewed changes

mislav suggested changes Oct 18, 2021

View reviewed changes

JamesMGreene added 2 commits October 18, 2021 13:01

Revert stylistic changes to test code

d63ec17

Add comments explaining test intent

489a1a1

mislav approved these changes Oct 20, 2021

View reviewed changes

JamesMGreene merged commit ec3d65a into github:master Oct 20, 2021

JamesMGreene deleted the fix-alpha-ordering branch October 20, 2021 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve alphabetical ordering for non-version entries #19

Improve alphabetical ordering for non-version entries #19

Uh oh!

JamesMGreene commented Oct 16, 2021 •

edited

Loading

Uh oh!

JamesMGreene commented Oct 16, 2021 •

edited

Loading

Uh oh!

JamesMGreene commented Oct 16, 2021

Uh oh!

chesterbr left a comment

Uh oh!

mislav left a comment

Uh oh!

Uh oh!

Uh oh!

mislav Oct 18, 2021

Uh oh!

JamesMGreene Oct 18, 2021 •

edited

Loading

Uh oh!

mislav left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve alphabetical ordering for non-version entries #19

Improve alphabetical ordering for non-version entries #19

Uh oh!

Conversation

JamesMGreene commented Oct 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JamesMGreene commented Oct 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JamesMGreene commented Oct 16, 2021

Uh oh!

chesterbr left a comment

Choose a reason for hiding this comment

Uh oh!

mislav left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mislav Oct 18, 2021

Choose a reason for hiding this comment

Uh oh!

JamesMGreene Oct 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mislav left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JamesMGreene commented Oct 16, 2021 •

edited

Loading

JamesMGreene commented Oct 16, 2021 •

edited

Loading

JamesMGreene Oct 18, 2021 •

edited

Loading