Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Optimize Enumerable.Skip() for IList<> parameter#4551

Merged
VSadov merged 3 commits intodotnet:masterfrom
scalablecory:linq-skip-optimization
Jan 4, 2016
Merged

Optimize Enumerable.Skip() for IList<> parameter#4551
VSadov merged 3 commits intodotnet:masterfrom
scalablecory:linq-skip-optimization

Conversation

@scalablecory
Copy link
Copy Markdown

Changes Enumerable.Skip() from O(n) to O(1) when an IList<> is passed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to cache source.Count into a "local"; otherwise this will incur an interface call on each iteration. The extra field for that localon the display class is likely a good tradeoff.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there was contention about saving the Count value to a local (yes: save on interface invocation, no: what if the list was mutated) I think we should have a test that fires one of these up, iterates a bit, removes some data from the original IList value, and continues iterating. That way the most-recently agreed-upon behavior is codified in a test that goes in along with the change.

@stephentoub
Copy link
Copy Markdown
Member

One comment, but otherwise the src changes LGTM.

We likely need some more tests. The code coverage report shows the new code being covered, but from looking at the existing test cases I think that might be misleading. We should ensure that we're examining all of the same inputs / cases for both the enumerable and list inputs.

cc: @VSadov, @JonHanna

@JonHanna
Copy link
Copy Markdown
Contributor

It might be worth measuring the times on arrays, as iterating through them via IList is relatively expensive.

Potentially, this could work well in combination with the IPartition interface of #2401 which would allow for combinations of skips and takes to fare well, though there's no reason why this should wait on that.

@JonHanna
Copy link
Copy Markdown
Contributor

There would be changed behaviour in terms of if changes to the source happened within the enumeration of the items. I'm not sure it's an issue, as it would make code that currently throws work, rather than the other way around (and I've personally never agreed with enumerators explicitly banning such changes anyway), but it's worth noting that it's not an unobservable change in that regard at least.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this contradicts what @stephentoub said in recommending this a commit ago, but all of the cases where calls to Count are currently pushed to a local are true locals and used to produce a result immediately rather than captured and used in an iterator. This means the only way it can race against something on the same thread is relatively obscure happening in a Func. In this case there could be something affecting Count within a foreach on the results of the skip. It might be more conservative to keep calling Count.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree on this. Sideeffecting funcs are discouraged, but cannot be prevented.
When replacing iteration with indexing, we need to re-check Count to preserve existing behavior or we could potentially break some previously-working code.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we're leaning toward not caching the count. Someone give a definitive word on this and I'll revert the commit. @stephentoub, any objection?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub, any objection?

No objection, thanks. My suggestion was based on performance, but that loses out to correctness. The only way it would be incorrect is in a situation where there's already invalid usage, but I'm fine with the argument that we want to be as correct as possible even in that situation, given some definition of "correct" (there will still be oddities, and things that may be considered incorrect).

@JonHanna
Copy link
Copy Markdown
Contributor

Two concerns, but I like it in principle. I think as well as tests it would be good to have some performance information. I suspect there's a trade-off here with most source IList implementations (especially arrays, and most of all arrays being used with array variance, but the others as well) so that there's a threshold for count as a percentage of the number of items actually enumerated below which this is a nett loss. Proving that suspicion wrong would of course be fantastic, but otherwise having an idea of just what sort of numbers gain and which lose would be good.

@ikopylov
Copy link
Copy Markdown
Contributor

This PR changes the behavior of the following code:

var list = new List<int>() { 1, 2, 3 };
foreach (var item in list.Skip(1))
    list.Add(item);

Original implementation throws InvalidOperationException due to the modification of the collection. New implementation do not (or even throws ArgumentOutOfRangeException when Remove used).

@stephentoub
Copy link
Copy Markdown
Member

@ikopylov (and @JonHanna, who also mentioned it in a comment earlier), it does, but we've also been adding such special cases elsewhere in LINQ, in fact @JonHanna I believe you've added some of them 😉. If we're concerned about it in this case, we've got a lot of changes to go back and revert. I believe we agreed this was an acceptable difference. @VSadov? @weshaggard?

@ikopylov
Copy link
Copy Markdown
Contributor

I believe we agreed this was an acceptable difference.

That's great.

However source.Count should not be cached to avoid ArgumentOutOfRangeException.

@stephentoub
Copy link
Copy Markdown
Member

However source.Count should not be cached to avoid ArgumentOutOfRangeException.

I'm ok with that, but to be clear, changing the collection is going to potentially result in very strange behavior, even without that. For example, if an item is added to the beginning of the list between two iterations, this change will likely result in the same item being returned twice.

@JonHanna
Copy link
Copy Markdown
Contributor

I'm pretty sure I've avoided that particular type of change (I might be wrong), but in any case I think this observable change is worth:

  1. noting
  2. considering carefully
  3. and then ultimately deciding as okay.

Step 2 of course entails I'm open to persuasion on step 3 :) Still, I never did agree with that InvalidOperationException in the first place, so I'm not against having it bypassed.

I do agree that Count shouldn't be cached. As I said above, the only places we currently do this can only (in terms of a single thread, anyway) have it altered by strange things in Funcs hit by the query. Here it would still take some relatively strange code to hit it, but not as strange (especially if one follows the idea that Linq Funcs should be pure functions).

Most of all I wonder about the performance. The constant cost of the test I think is negligible (now there's something I certainly have added several times myself), but the impact of the call to IList's indexer could undo the benefits.

@stephentoub
Copy link
Copy Markdown
Member

I'm pretty sure I've avoided that particular type of change

Hmm, ok, we'll need to go back and look. I thought we'd already discussed such a change in the context of other ones being made previously. If that's not the case, then we need to put the brakes on this change and address that issue before moving forward with this.

@VSadov
Copy link
Copy Markdown
Member

VSadov commented Nov 18, 2015

About throwing IOE when collection changes while iterating. IMO this is a fairly useless behavior and I have not seen a single case of code depending on this. In particular, the behavior composes very unreliably. Example: trivial concatenation of two lists via Linq will result in something that will guard against modifications, but only when you are enumerating the matching half ...
Honestly, I think if this kind of checks were removed everywhere, nobody would notice.
I think this part of the change is ok.

@JonHanna
Copy link
Copy Markdown
Contributor

Honestly, I think if this kind of checks were removed everywhere, nobody would notice.

I would. I'd open a particularly nice bottle and toast whoever made that change that I've been wanting to see for over a decade ;)

@svick
Copy link
Copy Markdown
Member

svick commented Nov 18, 2015

What about ImmutableList<T>? It implements IList<T>, so this change would apply to it, but its indexer seems to be O(log n) (unfortunately this is not documented).

This means that iterating immutableList.Skip(k) is O(n) with the old implementation, but O((n-k) * log n) with the proposed change. So, it's likely that this change would actually decrease performance for ImmutableList<T>, especially for large n and small k. (Though I haven't actually performed any benchmarks.)

@svick
Copy link
Copy Markdown
Member

svick commented Nov 18, 2015

@VSadov

About throwing IOE when collection changes while iterating. IMO this is a fairly useless behavior and I have not seen a single case of code depending on this.

I think that's because it's a boneheaded exception: no production code will rely on it, but it is still useful, because it tells developers when they made an error.

Maybe it would make sense to find some other, more reliable, way to tell developers about that error, like a Roslyn analyzer?

@stephentoub
Copy link
Copy Markdown
Member

We could special case T[] rather than IList<T> and avoid both of these issues.

Separately, though, it seems unfortunate if ImmutableList<T>'s indexer is O(log n). If that is the case, it is unexpected, and certainly not a pit of success, as, separate from this change, common code for walking each item in the list via for will end up being O(n log n) rather than O(n). @AArnott, thoughts?

@scalablecory
Copy link
Copy Markdown
Author

@JonHanna Benchmark results show this is a clear winner for arrays and lists regardless of skip length.
Benchmark

Other optimizations that could be made: ToArray/ToList speedup (not sure how often Skip is piped to those, I can't say I've ever done it, so I left it out), Skip(<=0) simply returning the passed in collection unchanged (again can't say I've ever done this).

@AArnott
Copy link
Copy Markdown

AArnott commented Nov 19, 2015

@stephentoub said:

it seems unfortunate if ImmutableList 's indexer is O(log n)

That is in fact the case. ImmutableList is internally a binary tree, so indexing into it is O(log n). So indexing every item is n log n, whereas a straight up enumeration of it using its enumerator is simply n.

@JonHanna
Copy link
Copy Markdown
Contributor

@scalablecory

Benchmark results show this is a clear winner for arrays and lists regardless of skip length.

I also got reassuring results trying the following, which isn't extensive but does hit a case deliberately engineered to be a case that most hits an expected worse-case:

[Fact]
public void QuickSpeedTest()
{
    IEnumerable<object> source = new string[100000];
    var sw = Stopwatch.StartNew();
    for(int i = 0; i != 10000; ++i)
        foreach(var item in source.Skip(1))
        {
        }
    sw.Stop();
    Assert.Equal(0, sw.ElapsedMilliseconds); // Just want this output
}

It came in at a very slight gain even with the unsafe memoising of Count undone.

I'm satisfied about the case I was most worried about.

ImmutableList I think is less of a worry. It could be easily enough given its own optimisation:

ImmutableList<TSource> immute = source as ImmutableList<TSource>
if (immute != null)
{
    int newLen = immute.Count - count;
    if (newLen <= 0) return Enumerable.Empty<TSource>();
    return immute.GetRange(count, newLen);
}

But apart from this needing to create a dependency on System.Collections.Immutable, I think the case where ImmutableList is being used and the code has to be agnostic to the fact that it is being used (rather than just use GetRange instead of Linq because the coder knows they're using immutable structures) is likely relatively obscure. A lot of things have very different performance behaviour with immutable collections, which is one of the reasons to use them (since that difference is often for the better) and they're used with those differences in mind. There might be value in an extension within Collections.Immutable which did Skip() typed to ImmutableList<T> and/or one that provided a wrapper enumerable for passing to code (not just limited to Linq) where those performance differences would hurt more than help.

Other optimizations that could be made: ToArray/ToList speedup (not sure how often Skip is piped to those, I can't say I've ever done it, so I left it out).

I've certainly piped Skip(pageNum * PageSize).Take(PageSize) to those, but not skip. If you wanted to go that far, then I'd suggest optimising Take on the result too, and hence covering that case as well. (Again #2401 has an interface for a type that can be optimised for both Skip and Take, so if that's accepted it could be a good basis for it).

…Skip(<=0) simply returning the passed in collection unchanged (again can't say I've ever done this).

I'd say its relatively common, when the value passed to Skip() is a variable that could be 0. That said, while I would really like to do this (along with a few other cases where we could avoid allocation by returning an existing object), it would result in a strong change on the observable behaviour. Indeed, I remember @VSadov saying he's seen people call .Skip(0) purely as a means to quickly get an object which enumerates the same item but is not the same collection unchanged.

In all, this one commit ago (before the caching of Count) LGTM.

@JonHanna
Copy link
Copy Markdown
Contributor

We likely need some more tests. The code coverage report shows the new code being covered, but from looking at the existing test cases I think that might be misleading.

Yes, the current tests are a mixture of some I created, which mostly use Enumerable.Range() and those imported from the legacy tests that mostly use arrays. They are checking on slightly different things. They each need to be changed to have both a test with guaranteed non-list (Enumerable.Range() doesn't absolutely guarantee that, and treating it as a list is a possible future optimisation, but NumberRangeGuaranteedNotCollectionType() does guarantee that) and guaranteed IList.

@scalablecory
Copy link
Copy Markdown
Author

Count memoization reverted, should be good to go.

@JonHanna
Copy link
Copy Markdown
Contributor

All the tests that hit the old iterator should ideally be doubled up so that there is a version for both guaranteed list source and guaranteed non-list source.

@VSadov
Copy link
Copy Markdown
Member

VSadov commented Dec 16, 2015

LGTM

VSadov added a commit that referenced this pull request Jan 4, 2016
Optimize Enumerable.Skip() for IList<> parameter
@VSadov VSadov merged commit e9f2377 into dotnet:master Jan 4, 2016
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 10, 2016
dotnet#4551 introduced optimised versions of Skip for IList<T> sources. Have all
tests for Skip test both this and the previous path.
ericeil pushed a commit to ericeil/corefx that referenced this pull request Jan 12, 2016
dotnet#4551 introduced optimised versions of Skip for IList<T> sources. Have all
tests for Skip test both this and the previous path.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 16, 2016
Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 16, 2016
Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 16, 2016
Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 29, 2016
Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 30, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 30, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 30, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 30, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Jan 31, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Feb 11, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Feb 11, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Feb 12, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Feb 12, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Feb 12, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
JonHanna added a commit to JonHanna/corefx that referenced this pull request Feb 12, 2016
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.
@karelz karelz modified the milestone: 1.0.0-rtm Dec 3, 2016
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Anything that can serve as one can serve as the other, and also provide
a faster path for Count(). Merge the two interfaces and add a Count
property.

Have IList optimised result of Skip() partitionable.

Optimisation of Skip() for IList sources from dotnet/corefx#4551 fits with
optimisations of Skip() and Take() for other sources from dotnet/corefx#2401.

Combine the approaches, extending how the result of Skip() on a
list handles subsequent operations.


Commit migrated from dotnet/corefx@a087c2d
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.