Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Recognize Skip/Take chains on lazy sequences.#13628

Merged
stephentoub merged 3 commits intodotnet:masterfrom
jamesqo:skip-take
Dec 2, 2016
Merged

Recognize Skip/Take chains on lazy sequences.#13628
stephentoub merged 3 commits intodotnet:masterfrom
jamesqo:skip-take

Conversation

@jamesqo
Copy link
Copy Markdown
Contributor

@jamesqo jamesqo commented Nov 13, 2016

I added a new EnumerablePartition<TSource> class that lets us now recognize patterns like Take.Skip, or vice versa, for lazy sequences. We also remove a layer of indirection from some common operations following these methods, e.g. Take.ToArray, Skip.ToArray, etc.

Perf results can be found here; test code here. There is a ~20% speedup that results from removing an indirection layer from the chain.

cc @JonHanna, @VSadov, @stephentoub

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This can overflow if _maxIndexInclusive is int.MaxValue, and _minIndexInclusive is 0 (which happens for e.Skip(0)). It doesn't matter though, since we decrement the value right away.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment about that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have avoided optimizing the TryGet* functions if I could have, since they take a lot of code and are not as valuable to optimize as the other functions. But, currently they come with implementing IPartition.

@JonHanna
Copy link
Copy Markdown
Contributor

The combination .Skip(…).Take(…) is a common one, so this seems a reasonable case to cover in this way.

Comment thread src/System.Linq/tests/TakeTests.cs Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't using _state like this cause a problem for really long enumerables with > int.MaxValue - 3 elements?

Copy link
Copy Markdown
Contributor Author

@jamesqo jamesqo Nov 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move to long.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure tests are added to cover all of these various branches, here and elsewhere.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this whole method just be:

private bool SkipBeforeFirst(IEnumerator<TSource> en) => TakeBefore(_minIndexInclusive, en);

?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub Good idea. In fact, I renamed the method to be SkipBefore and had TryGetElementAt do SkipBefore(_minIndexInclusive + index, en). Should be slightly more efficient.

Comment thread src/System.Linq/src/System/Linq/Skip.cs Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this end up limiting enumerables to those with int.MaxValue elements? That limitation was not there before.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change this to use long instead. It's unlikely that an enumerable that comes close to long.MaxValue elements can be iterated in a reasonable amount of time.

@jamesqo jamesqo changed the title Recognize Skip/Take chains on lazy sequences. [no merge] Recognize Skip/Take chains on lazy sequences. Nov 15, 2016
@jamesqo
Copy link
Copy Markdown
Contributor Author

jamesqo commented Nov 15, 2016

Marking as no merge temporarily. Since long fields are now being used, which takes up an extra 16 bytes, I'll have to re-measure the memory impact this change has. (Also, there's some more refactoring I'd like to do.)

@jamesqo jamesqo changed the title [no merge] Recognize Skip/Take chains on lazy sequences. Recognize Skip/Take chains on lazy sequences. Nov 19, 2016
@jamesqo jamesqo changed the title Recognize Skip/Take chains on lazy sequences. [WIP] Recognize Skip/Take chains on lazy sequences. Nov 19, 2016
@jamesqo jamesqo changed the title [WIP] Recognize Skip/Take chains on lazy sequences. Recognize Skip/Take chains on lazy sequences. Nov 19, 2016
@jamesqo
Copy link
Copy Markdown
Contributor Author

jamesqo commented Nov 19, 2016

@stephentoub, I managed to refactor the iterator to support enumerables > int.MaxValue, while still using int for the indices. The trick is _maxIndexInclusive is set to -1 now if we're taking an indeterminate number of items (e.g. all Skips). I've also added tests and done some refactoring.

@JonHanna, would appreciate if you could review as well. I ran into a couple of subtle integer overflow bugs whilst working on this PR, so the more eyes the better.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're dishing out an extra unnecessary write/branch per iteration for the loop if HasLimit is false (e.g. if only Skip has been called, and not Take). It could be specialized like this:

if (!HasLimit)
{
    do { builder.Add(en.Current); } while (en.MoveNext());
}
else
{
    do { remaining--; builder.Add(en.Current); } while (remaining >= 0 && en.MoveNext());
}

I figured it wasn't worth the extra code though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to add a uint overload of this function to support scenarios where _maxIndexInclusive is exactly int.MaxValue & we call Count(), for example e.Skip(1).Take(int.MaxValue).Count(). We don't lose any validation at callsites other than GetCount though, since the signed wrapper asserts.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting choice to represent bounds.
Why not uint _start and uint _length ? I think the math might get simpler.
You can still use uint.MaxValue as an "unlimited" marker.

Copy link
Copy Markdown
Contributor Author

@jamesqo jamesqo Nov 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VSadov I originally had something like that, actually, using ints. But since all of the other partitions used _minIndexInclusive and _maxIndexInclusive I decided to follow their lead.

Why not uint _start and uint _length? I think the math might get simpler.

I used ints for the fields, since it's more idiomatic. If we know the length (Take has been called atl. once) then the count will fit in an int anyway. Also, it'd still be possible that _start could overflow during a chained Skip / Take & we'd have to wrap ourselves in another iterator.

@jamesqo
Copy link
Copy Markdown
Contributor Author

jamesqo commented Nov 29, 2016

@stephentoub I updated this PR to match the changes in #14020. Can you please review when you have time? Thanks.

@VSadov
Copy link
Copy Markdown
Member

VSadov commented Nov 29, 2016

LGTM
@stephentoub - could you take another look at the changes?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value is only meaningful if HasLimit is true, right? Should we add a Debug.Assert(HasLimit) to the getter?

Copy link
Copy Markdown
Member

@stephentoub stephentoub Nov 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How expensive would it be to have an outerloop test that overflowed to test this condition? Probably prohibitive?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it would be 2B * 3 virtual method calls (MoveNext, Current, and MoveNext on the iterator) plus all of the other logic, so yes, probably too expensive for a test.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. In a few places in our tests we do use reflection to put the object into a state that would make it hit such cases quickly, e.g.
https://github.com/dotnet/corefx/blob/master/src/System.Runtime.InteropServices/tests/System/Runtime/InteropServices/HandleCollectorTests.cs#L63
Would it be worth doing something like that here? Or we don't think there's enough risk to warrant that kind of fragility?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth doing something like that here? Or we don't think there's enough risk to warrant that kind of fragility?

It is low risk, but at the same time I think it is unlikely the field name of _state will change. I will add a test.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this is why you don't assert HasLimit in Limit's getter.

Copy link
Copy Markdown
Member

@stephentoub stephentoub Nov 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: looking at a code coverage report, this is never hit. Missing a test?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add one.

Copy link
Copy Markdown
Contributor Author

@jamesqo jamesqo Dec 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub Made the change. Something to note: While adding coverage for this, I also realized I had forgotten to override Select in this type. (Select checks for an Iterator first, then an IPartition, and the default implementation of Iterator.Select is to return a SelectEnumerableIterator rather than a SelectIPartitionIterator. Thus it's necessary to override Select here to return the latter for better perf.) So thank you for pointing that out.

@stephentoub
Copy link
Copy Markdown
Member

Thanks, @jamesqo. LGTM.

@stephentoub stephentoub merged commit 3554ed2 into dotnet:master Dec 2, 2016
@jamesqo jamesqo deleted the skip-take branch December 2, 2016 16:51
@karelz karelz modified the milestone: 1.2.0 Dec 3, 2016
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants