Skip to content

Comments

Fix Issue 18280 - std.algorithm.comparison.cmp for non-strings should call opCmp only once per item pair#6056

Merged
andralex merged 1 commit intodlang:masterfrom
n8sh:algorithm-cmp
Jan 24, 2018
Merged

Fix Issue 18280 - std.algorithm.comparison.cmp for non-strings should call opCmp only once per item pair#6056
andralex merged 1 commit intodlang:masterfrom
n8sh:algorithm-cmp

Conversation

@n8sh
Copy link
Member

@n8sh n8sh commented Jan 22, 2018

Right now when comparing non-string ranges std.algorithm.comparison.cmp calls the comparison predicate twice for every pair of elements to determine their order. When the elements have overloaded opCmp and the predicate is "a < b" we only need to call opCmp once for each pair.

@dlang-bot
Copy link
Contributor

dlang-bot commented Jan 22, 2018

Thanks for your pull request, @n8sh! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.

Some tips to help speed things up:

  • smaller, focused PRs are easier to review than big ones

  • try not to mix up refactoring or style changes with bug fixes or feature enhancements

  • provide helpful commit messages explaining the rationale behind each change

Bear in mind that large or tricky changes may require multiple rounds of review and revision.

Please see CONTRIBUTING.md for more information.

Bugzilla references

Auto-close Bugzilla Description
18280 std.algorithm.comparison.cmp for non-strings should call opCmp only once per item pair
18285 std.algorithm.comparison.cmp for strings with custom predicate compares lengths wrong
18286 std.algorithm.comparison.cmp for string with custom predicate fails if distinct chars can compare equal
18288 std.algorithm.comparison.cmp for wide strings should be @safe

static if (!(isSomeString!R1 && isSomeString!R2))
{
static if (is(typeof(pred) : string))
enum isLessThan = pred == "a < b";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: @andralex isn't a huge fan of string comparisons, see #4265 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I did a bit of digging the string specialization of cmp was added in 2011: a0ecf2a

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good news, dlang/dmd#7484 is on. With it we should be able to switch to lambdas soon. cc @RazvanN7

if (binaryFun!pred(b, a)) return 1;
static if (isLessThan && is(typeof(r1.front.opCmp(r2.front)) : int))
{
immutable int c = r1.front.opCmp(r2.front);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not r1.front < r2.front?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to get a "threeWay" value (as it is called elsewhere in this function): negative if r1.front < r2.front, positive if r2.front < r1.front, zero if neither.

Copy link
Member

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is getting a bit on the odd side. How about we split cmp into two overloads:

int cmp(R1, R2)(R1 r1, R2 r2)
if (isInputRange!R1 && isInputRange!R2);
int cmp(alias pred, R1, R2)(R1 r1, R2 r2)
if (isInputRange!R1 && isInputRange!R2);

Group documentation together. Documentation specifies that without a predicate specified, opCmp is used.

n8sh added a commit to n8sh/phobos that referenced this pull request Jan 23, 2018
Copy link
Member

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I committed a few changes. One thing I'm still worried about is this won't work for types that define opCmp to return a floating-point number.

Please remove the test for pred.

static if (is(typeof(pred) : string)
&& (__traits(compiles, { static assert("a < b" == pred); })))
{
// In case someone was explicitly using the default predicate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove, no need to accommodate this case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

n8sh added a commit to n8sh/phobos that referenced this pull request Jan 23, 2018
n8sh added a commit to n8sh/phobos that referenced this pull request Jan 23, 2018
@n8sh
Copy link
Member Author

n8sh commented Jan 23, 2018

I committed a few changes.

Sorry, I clobbered your changes when I force pushed to add a unittest.

@andralex
Copy link
Member

OK, I submitted them again.

I think since we got here we should fix the float return. So cmp should return auto and let opCmp drive.

@n8sh
Copy link
Member Author

n8sh commented Jan 23, 2018

For float opCmp what do we do with a NaN result? "Not positive or negative, so we keep going" or "not zero, so we return it"?

@andralex
Copy link
Member

@n8sh Stop the comparison and forward the NaN result. So the core of the loop would be:

auto c = r1.front.opCmp(r2.front);
if (c != 0) return c;

n8sh added a commit to n8sh/phobos that referenced this pull request Jan 23, 2018
@@ -614,40 +623,41 @@ if (isInputRange!R1 && isInputRange!R2)
{
if (r1.empty) return -cast(int)!r2.empty;
if (r2.empty) return !r1.empty;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these two returns you need a bit of massaging. I'd say something like this:

static if (is(typeof(r1.front.opCmp(r2.front))) R)
    alias Result = R;
else
    alias Result = int;
if (r1.empty) return -R(!r2.empty);
if (r2.empty) return R(!r1.empty);
...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

result = cmp!"a > b"("", "");
assert(result == 0);
result = cmp!"a > b"("abc", "abcd");
assert(result > 0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is a bug. r1 is a prefix of r2, so the result should be negative even if we are comparing in reverse alphabetical order, but cmp is using pred to compared the lengths! I have no sense of whether anyone might have been relying on this for reverse order comparison. It is a violation of the documentation and it only works for strings. Should it be fixed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's terrible. I'd say open a bug with this repro then mark it as fixed by this PR. That way we'll have it in the changelog.

n8sh added a commit to n8sh/phobos that referenced this pull request Jan 23, 2018
@andralex
Copy link
Member

@n8sh you'll need to add a few more unittests because now you are generalizing cmp. Sorry this fell on you and thanks for the quality code!

@n8sh
Copy link
Member Author

n8sh commented Jan 23, 2018

Found another bad bug. This code:
if (c1 != c2) return threeWayInt(cast(int) c1, cast(int) c2);
meant that no custom predicate that treats different values the same (for example, ignoring case) will work correctly. They'll incorrectly return 0 prematurely.

Example https://run.dlang.io/is/feq4KK

void main()
{
    import std.algorithm.comparison : cmp;
    static bool lessThanCaseInsensitive(size_t a, size_t b)
    {
        import std.ascii : toUpper;
        return toUpper(cast(dchar) a) < toUpper(cast(dchar) b);
    }
    static assert(cmp!lessThanCaseInsensitive("apple2", "APPLE1") != 0,
                 "These are clearly not the same!");
}

@andralex
Copy link
Member

@n8sh it's a shame these haven't been exposed already. Thanks for your work! Please file that bug, too (or massage it with the other) and then mark them as fixed with this PR.

@andralex
Copy link
Member

@n8sh I made a push to your code that removes cast in promotions.

@andralex
Copy link
Member

Approved, please squash and then we'll merge. Thanks!!

@andralex
Copy link
Member

Aaand made one more minor optimization - empty was sometimes checked twice at the end of the ranges.

@n8sh n8sh force-pushed the algorithm-cmp branch 2 times, most recently from 4d9af90 to 78c2ceb Compare January 23, 2018 19:58
n8sh added a commit to n8sh/phobos that referenced this pull request Jan 23, 2018
… call opCmp only once per item pair

split cmp into two overloads per @andralex

dlang#6056 (review)

Minor adjustments, again

cmp should return auto and let opCmp drive

dlang#6056 (comment)

Fix Issue 18285 - std.algorithm.comparison.cmp for strings with custom predicate compares lengths wrong

Test std.algorithm.comparison.cmp when opCmp returns float

Promotions should not use cast

Optimize cmp's endgame

There are some redundant tests when the end of the ranges is reached. Eliminated that, and improved threeWayByPred.

Fix Issue 18286 - std.algorithm.comparison.cmp for string with custom predicate fails if distinct chars can compare equal

Fix Issue 18288 - std.algorithm.comparison.cmp for wide strings should be @safe

re-apply remove cast in promotions
n8sh added a commit to n8sh/phobos that referenced this pull request Jan 23, 2018
… call opCmp only once per item pair

split cmp into two overloads per @andralex

dlang#6056 (review)

Minor adjustments, again

cmp should return auto and let opCmp drive

dlang#6056 (comment)

Fix Issue 18285 - std.algorithm.comparison.cmp for strings with custom predicate compares lengths wrong

Test std.algorithm.comparison.cmp when opCmp returns float

Promotions should not use cast

Optimize cmp's endgame

There are some redundant tests when the end of the ranges is reached. Eliminated that, and improved threeWayByPred.

Fix Issue 18286 - std.algorithm.comparison.cmp for string with custom predicate fails if distinct chars can compare equal

Fix Issue 18288 - std.algorithm.comparison.cmp for wide strings should be @safe

re-apply remove cast in promotions
@n8sh n8sh force-pushed the algorithm-cmp branch 2 times, most recently from cb5e5ab to 0710578 Compare January 24, 2018 00:49
… call opCmp only once per item pair

split cmp into two overloads per @andralex

dlang#6056 (review)

Minor adjustments, again

cmp should return auto and let opCmp drive

dlang#6056 (comment)

Fix Issue 18285 - std.algorithm.comparison.cmp for strings with custom predicate compares lengths wrong

Test std.algorithm.comparison.cmp when opCmp returns float

Promotions should not use cast

Optimize cmp's endgame

There are some redundant tests when the end of the ranges is reached. Eliminated that, and improved threeWayByPred.

Fix Issue 18286 - std.algorithm.comparison.cmp for string with custom predicate fails if distinct chars can compare equal

Fix Issue 18288 - std.algorithm.comparison.cmp for wide strings should be @safe

re-apply remove cast in promotions
@n8sh
Copy link
Member Author

n8sh commented Jan 24, 2018

Design-wise I would be happier if cmp and similar functions in std.algorithm instead of taking a predicate that returns true or false instead took a function that returns negative/zero/positive like opCmp and cmp itself, but a change like that would be disruptive.

@andralex andralex merged commit 2174695 into dlang:master Jan 24, 2018
@PetarKirov
Copy link
Member

@n8sh perhaps the way forward is to make it an option but also keep the existing behavior by querying the predicate type via static if.

@andralex
Copy link
Member

There's always another name such as cmp3... but I'd say distinguishing a predicate that returns bool from one that returns anything else is a fine option too.

@aG0aep6G
Copy link
Contributor

The bot picked up 4 issues here, but it only closed one. Bug?
dlang/dlang-bot#175

@n8sh
Copy link
Member Author

n8sh commented Jan 26, 2018

After @aG0aep6G's message I closed the issues manually, but I don't know if that might result in them not getting added to the next release changelog.

@wilzbach
Copy link
Contributor

After @aG0aep6G's message I closed the issues manually, but I don't know if that might result in them not getting added to the next release changelog.

A bit of explanation:

The bugs are closed by the Bugzilla integration - not the bot.
The changelog is generated by https://github.com/dlang/tools/blob/master/changed.d - it parses the git log between two revisions with the same regex as the bot and the bugzilla integration does. However, it then queries Bugzilla and filters for only referenced bugs which are closed.
So the bug overview by the bot is merely a help for reviewers as (a) it provides a handy link to the references issues and (b) allows them to verify that the issues have been referenced correctly.
See also: https://github.com/dlang-bots/dlang-bot#automated-references

Now to the bug - it looks like the Bugzilla integration is a bit dumber and only picks up one bug regex per commit. There's much I can do about (except for writing our own Bugzilla integration).
In any case, a "workaround" for the future if you have to use one commit is to include multiple issues in one regex, e.g. "Fixes issues 17494, 17505, 17506".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants