Skip to content

Comments

Improve merge() algorithm#4888

Merged
andralex merged 1 commit intodlang:masterfrom
nordlow:better-merge
Dec 19, 2016
Merged

Improve merge() algorithm#4888
andralex merged 1 commit intodlang:masterfrom
nordlow:better-merge

Conversation

@nordlow
Copy link
Contributor

@nordlow nordlow commented Nov 4, 2016

This is revival of my previous #3315 applied on top of existing implementation of merge that sneaked in without me noticing it ;)

Improvements:

  • Bidirectional support if all inputs are bidirectional
  • Propagates infiniteness if any of the inputs is infinite.
  • Add in-place checking of sortedness in popFront and popBack. Done by comparing current and next element if element type allows it.

Other notes:

  • Checking for isSortedRange in my previous implementation has been removed because we currently cannot compare all the different formats for the comparison alias (pred) fed to SortedRange.

Update:
I just noticed that Merge is used in setops. I guess this removes the opportunity for making this a Voldermort type, right?

@nordlow nordlow force-pushed the better-merge branch 2 times, most recently from 32b2d1a to 060b6fc Compare November 4, 2016 23:58
@nordlow nordlow changed the title Make merge bi-directional, have infiniteness and more docs Make merge() bi-directional, have infiniteness and more docs Nov 5, 2016
@nordlow nordlow changed the title Make merge() bi-directional, have infiniteness and more docs Improve merge() algorithm Nov 5, 2016
Merge!(less, Rs) merge(alias less = "a < b", Rs...)
(Rs rs)
/**
Merge multiple sorted ranges $(D rs) with less-than predicate function $(D
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for single words prefer backticks to the D macro

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/**
Merge multiple sorted ranges $(D rs) with less-than predicate function $(D
pred) into one single sorted output range containing the sorted union of the
elements of inputs. Elements in the output are not unique, that is the length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/that is/consequently/

Better yet, rephrase the sentence: "Duplicates are not eliminated, meaning that the total number of elements in the result of merge is the sum of all elements in the ranges passed to it."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Details:

All of its inputs are assumed to be sorted. This can mean that inputs are
instances of $(XREF range, SortedRange). Use the result of $(XREF algorithm,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no such restriction in Merge or merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, hang on I'll remove this doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

This algorithm is lazy, doing work progressively as elements are pulled off
the result.

Average complexity is $(BIGOH n * k) for $(D k) inputs of maximum length $(D n).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Time complexity is proportional to the sum of element counts over all ranges."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

assert(equal(c.retro, [5, 4, 3, 2, 1, 0]));
-------
*/
Merge!(less, Rs) merge(alias less = "a < b", Rs...)(Rs rs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constraints should go here, in the user-visible and documented API, not in the Merge struct. The constraint should not use the package-level CommonElementType.

Copy link
Contributor Author

@nordlow nordlow Nov 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. But we should keep the restriction on Merge because its public and used by setops, right? Later when setUnion is removed we can perhaps turn it into a Voldemort type.

assert(m.empty);
}

package template CommonElementType(Rs...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is superfluous

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@andralex
Copy link
Member

andralex commented Nov 6, 2016

You may assert that the ranges are sorted opportunistically; an eager call to isSorted seems a bit much.

@nordlow
Copy link
Contributor Author

nordlow commented Nov 6, 2016

So no debug assertions of sortedness of output either, then?

Such a checking can be implemented inside popFront/Back without any extra memory.

@nordlow nordlow force-pushed the better-merge branch 2 times, most recently from b877135 to 5db822d Compare November 6, 2016 16:02
@nordlow
Copy link
Contributor Author

nordlow commented Nov 6, 2016

Only thing left now is potential checking for sortedness in frontIndex and backIndex.

A range containing the union of the given ranges.

See_Also:
$(XREF algorithm, setops, SetUnion)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use REF instead of XREF throughout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

A range containing the union of the given ranges.

See_Also:
$(REF algorithm, setops, SetUnion)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REF has a different order of arguments than XREF. Symbol name goes first, followed by the module including std. Here: REF(SetUnion, std,algorithm,setops). However, there doesn't seem to be a std.algorithm.setops.SetUnion.

Copy link
Contributor Author

@nordlow nordlow Nov 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I presume it's because setUnion has been deprecated by merge.

A range containing the union of the given ranges.

See_Also:
$(REF setUnion, std,algorithm,setops)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, syntax is correct now. But the link is dead because setUnion has been undocumented. I suggest to just drop the See_Also section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


All of its inputs are assumed to be sorted. This can mean that inputs are
instances of $(REF SortedRange, std,range). Use the result of $(REF sort,
std,algorithm), or $(REF assumeSorted, std,range) to merge ranges known to be sorted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The REF to sort is broken. It's std.algorithm.sorting.sort, so the REF must be $(REF sort, std,algorithm,sorting).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@nordlow nordlow force-pushed the better-merge branch 2 times, most recently from c856406 to 25aa2e8 Compare November 6, 2016 19:32
@aG0aep6G
Copy link
Contributor

aG0aep6G commented Nov 6, 2016

Ok, REFs look good to me.

@nordlow
Copy link
Contributor Author

nordlow commented Nov 6, 2016

Added debug-only efficient checking for sortedness inside Merge.pop{Front,Back}.

I'm satisfied now.

@nordlow nordlow force-pushed the better-merge branch 2 times, most recently from 16553de to c263e97 Compare November 6, 2016 23:39
@codecov-io
Copy link

Current coverage is 89.39% (diff: 93.40%)

Merging #4888 into master will increase coverage by <.01%

@@             master      #4888   diff @@
==========================================
  Files           124        124          
  Lines         76659      76704    +45   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          68527      68570    +43   
- Misses         8132       8134     +2   
  Partials          0          0          

Powered by Codecov. Last update 027839f...e1dfb51

@nordlow
Copy link
Contributor Author

nordlow commented Nov 10, 2016

@andralex ping

@andralex
Copy link
Member

As I mentioned via email I'm on the road this week.

@nordlow
Copy link
Contributor Author

nordlow commented Nov 10, 2016

Ok. Thanks, @andralex

Can anyone else approve this?

@aG0aep6G aG0aep6G mentioned this pull request Nov 13, 2016
assert(equal(c, [0, 1, 2, 3, 4, 5]));
assert(equal(c.retro, [5, 4, 3, 2, 1, 0]));
-------
*/
Copy link
Member

@CyberShadow CyberShadow Nov 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, the documentation should have a See_Also section linking to setUnion, and an explanation for how this function is different from it.

Copy link
Contributor Author

@nordlow nordlow Nov 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll add a ref. Notice however that setUnion is not visible online anymore. I find that strange. Is this intentional?

https://dlang.org/phobos/std_algorithm_setops.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's because it's been deprecated in favor of merge. I guess I was missing something after all. Never mind :)

@aG0aep6G
Copy link
Contributor

On 11/21/2016 10:53 PM, Per Nordlöw wrote:

These |foreach|es are static ones. I'm pretty sure that the break
associates with |switch| here. If it we're I don't see how the
unittests could pass.

You're mistaken. The break applies to the foreach. The behavior is a bit
weird, though: The break doesn't end the static foreach loop. It jumps
behind the foreach body at run-time, still inside the switch. If there's
code there (e.g. a default case) that gets executed. There may be other
subtle differences.

To break the switch instead, use a label:

sw: switch (foo)
{
     foreach (x; alias_seq)
     {
         case x: /* ... */ break sw;
     }
}

@nordlow
Copy link
Contributor Author

nordlow commented Nov 21, 2016

Ok, @aG0aep6G I'll fix.

@aG0aep6G, what do you think about the updated limiting of the sortedness logic?

@nordlow
Copy link
Contributor Author

nordlow commented Nov 21, 2016

@aG0aep6G Changed the two switch breaks to be labeled.

@nordlow
Copy link
Contributor Author

nordlow commented Nov 21, 2016

And cleaned up the order of the imports and private defs at the top of Merge.

@aG0aep6G
Copy link
Contributor

@aG0aep6G, what do you think about the updated limiting of the sortedness logic?

Sorry, I'm not following the actual objective of this PR. Just skimming the messages.

By the way, I think you've edited in that question later on. It's not in the notification email from GItHub. I wouldn't have noticed it if I hadn't had this page open in the browser.

Also, you seem to update the PR by amending. In my experience, that makes it much harder to review the changes. I'd suggest to squash once at the end instead of amending. But Andrei does the actual review here, so that's his call.

@nordlow
Copy link
Contributor Author

nordlow commented Nov 21, 2016

Removed isSorted calls in unittest.

@nordlow
Copy link
Contributor Author

nordlow commented Nov 21, 2016

Ok, I'll change to not amending and then squashing at the end.

static if (isBidirectional)
{
_crt = _crt.max;
this._lastBackIndex = size_t.max; // uinitialized, lazy initialized in `front`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here and not in the initializer of the field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this assignments because it's initialized to size_t in declaration.

Unqual!ElementType bestElement;
foreach (i, _; Rs)
{
if (source[i].empty) { continue; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unusual, for single statements on one line we omit the braces (except when required e.g. for function definitions).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

comp(source[i].front, previousBack),
"Input " ~ i.stringof ~ " is unsorted"); // @nogc
}
break sw;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an aside, the label is not necessary - you may just leave the break here and insert one right after the foreach loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. Does that revert back to the first state, where this break associates with static foreach instead of the final switch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion is to have two unlabeled breaks, one inside the foreach and another one right after it. But it's really up to you.

alias opDollar = length;
}

public Rs source;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually put the state at the beginning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@nordlow nordlow force-pushed the better-merge branch 2 times, most recently from 47c762b to 5df9c70 Compare December 16, 2016 17:16
@nordlow
Copy link
Contributor Author

nordlow commented Dec 16, 2016

Rebased on master. One thing has changed: improved sortedness checking by breaking out CT-logic into canCheckSortedness.

@nordlow
Copy link
Contributor Author

nordlow commented Dec 16, 2016

ci/circleci — Your tests failed on CircleCI fails. Don't know why. Should I repush?

Copy link
Contributor

@wilzbach wilzbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ci/circleci — Your tests failed on CircleCI fails. Don't know why. Should I repush?

Have a look at the output. In the last weeks the CI testing has been improved.
I tried to explain the issues, but in general you can also run the tests locally, e.g. with make -f posix.mak style

auto c = merge(a.assumeSorted, b.assumeSorted)
assert(equal(c, [0, 1, 2, 3, 4, 5]));
assert(equal(c.retro, [5, 4, 3, 2, 1, 0]));
-------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this is a Ddoc unittest block?
If you put it as a normal unittest block and annotate it with the magic /// slashes it will be displayed in the same way, but it has the advantage that we can ensure that the examples does compile. Another advantage is that soon the users will be able to directly run unittests blocks on dlang.org and then of course we will enable this only for tests for which we can ensure they compile & work.

Copy link
Contributor Author

@nordlow nordlow Dec 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add this. It's ok to move it to a unittest if ok with the original author.

Copy link
Contributor Author

@nordlow nordlow Dec 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I removed it and added a test for bidirectional use (via .retro) of merge result in an existing unittest. Use of assumeSorted isn't revelant anymore.

/// test bi-directional access and common type
@safe pure nothrow unittest
{
import std.algorithm : equal;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output of make -f posix.mak style:

grep -nr 'import std.algorithm : ' $(find . -name '*.d') ; test $? -eq 1
./std/algorithm/sorting.d:1308:    import std.algorithm : equal;
make: *** [style] Error 1

-> std.algorithm.comparison

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


auto m = merge(a, b, c);

static assert(is(typeof(m.front) == CommonType!(S, I, D)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output of make -f posix.mak publictests:

executing out/std_algorithm_sorting.d
out/std_algorithm_sorting.d(165): Error: template instance CommonType!(S, I, D) template 'CommonType' is not defined
out/std_algorithm_sorting.d(165): Error: static assert  (is(double == _error_)) is false
make: *** [publictests] Error 1

As mentioned above we want to ensure that public tests do compile without missing imports and hence we started to automatically check for it. So a public unittest block can't use the global imports.
Solution 1: use the ddoced example as public unittest and make this one private
Solution 2: add the missing import -> import std.traits : CommonType

Copy link
Contributor Author

@nordlow nordlow Dec 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the import for CommonType.

@property bool empty()
{
return _crt == _crt.max;
size_t bestIndex = size_t.max; // indicate undefined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@nordlow nordlow Dec 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is ptrdiff_t preferred over size_t here? bestIndex is an absolute index here not a difference.

@nordlow nordlow force-pushed the better-merge branch 2 times, most recently from 0b4e8b9 to 3125beb Compare December 18, 2016 12:04
Copy link
Contributor

@wilzbach wilzbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import std.range: retro is still missing :/

int[] b = [2, 3, 4];

assert(a.merge(b).equal([1, 2, 3, 3, 4, 5]));
assert(a.merge(b).retro.equal([5, 4, 3, 3, 2, 1]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make -f publictests or CircleCi

out/std_algorithm_sorting.d(202): Error: no property 'retro' for type 'Merge!("a < b", int[], int[])'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

@nordlow nordlow Dec 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you mean: make -f posix.mak publictests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you mean: make -f posix.mak publictests?

Yes - sorry for the typo.

static assert(is(typeof(m.front) == CommonType!(S, I, D)));

assert(equal(m, [1, 2, 3, 10, 20, 30, 40, 50, 60]));
assert(equal(m.retro, [60, 50, 40, 30, 20, 10, 3, 2, 1]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make -f publictests or CircleCi

out/std_algorithm_sorting.d(169): Error: no property 'retro' for type 'Merge!("a < b", short[], int[], double[])'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


///
@safe pure nothrow unittest
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a personal preference, but I show the simpler unittest example as first example to the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Moved the longer unittest last.

@nordlow
Copy link
Contributor Author

nordlow commented Dec 19, 2016

Ready to pull?

}
final switch (_lastBackIndex)
{
foreach (i, _; Rs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, line shown mistakenly as not covered

private alias ElementType = CommonType!(staticMap!(.ElementType, Rs));
private enum isBidirectional = allSatisfy!(isBidirectionalRange, staticMap!(Unqual, Rs));

debug private enum canCheckSortedness = isCopyable!ElementType && !hasAliasing!ElementType;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess this would be a version:

debug static if (isCopyable!ElementType && !hasAliasing!ElementType)
    version = can_check_sortedness;

Unsure on which is the best way.

@andralex
Copy link
Member

Good engineering, thanks.

@andralex andralex merged commit cf07aed into dlang:master Dec 19, 2016
@nordlow nordlow deleted the better-merge branch October 19, 2021 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants