Optimize Enumerable.Skip() for IList<> parameter#4551
Optimize Enumerable.Skip() for IList<> parameter#4551VSadov merged 3 commits intodotnet:masterfrom scalablecory:linq-skip-optimization
Conversation
There was a problem hiding this comment.
It'd be good to cache source.Count into a "local"; otherwise this will incur an interface call on each iteration. The extra field for that localon the display class is likely a good tradeoff.
There was a problem hiding this comment.
Since there was contention about saving the Count value to a local (yes: save on interface invocation, no: what if the list was mutated) I think we should have a test that fires one of these up, iterates a bit, removes some data from the original IList value, and continues iterating. That way the most-recently agreed-upon behavior is codified in a test that goes in along with the change.
|
One comment, but otherwise the src changes LGTM. We likely need some more tests. The code coverage report shows the new code being covered, but from looking at the existing test cases I think that might be misleading. We should ensure that we're examining all of the same inputs / cases for both the enumerable and list inputs. |
|
It might be worth measuring the times on arrays, as iterating through them via Potentially, this could work well in combination with the |
|
There would be changed behaviour in terms of if changes to the source happened within the enumeration of the items. I'm not sure it's an issue, as it would make code that currently throws work, rather than the other way around (and I've personally never agreed with enumerators explicitly banning such changes anyway), but it's worth noting that it's not an unobservable change in that regard at least. |
There was a problem hiding this comment.
I know this contradicts what @stephentoub said in recommending this a commit ago, but all of the cases where calls to Count are currently pushed to a local are true locals and used to produce a result immediately rather than captured and used in an iterator. This means the only way it can race against something on the same thread is relatively obscure happening in a Func. In this case there could be something affecting Count within a foreach on the results of the skip. It might be more conservative to keep calling Count.
There was a problem hiding this comment.
Yes, agree on this. Sideeffecting funcs are discouraged, but cannot be prevented.
When replacing iteration with indexing, we need to re-check Count to preserve existing behavior or we could potentially break some previously-working code.
There was a problem hiding this comment.
Sounds like we're leaning toward not caching the count. Someone give a definitive word on this and I'll revert the commit. @stephentoub, any objection?
There was a problem hiding this comment.
@stephentoub, any objection?
No objection, thanks. My suggestion was based on performance, but that loses out to correctness. The only way it would be incorrect is in a situation where there's already invalid usage, but I'm fine with the argument that we want to be as correct as possible even in that situation, given some definition of "correct" (there will still be oddities, and things that may be considered incorrect).
|
Two concerns, but I like it in principle. I think as well as tests it would be good to have some performance information. I suspect there's a trade-off here with most source |
|
This PR changes the behavior of the following code: var list = new List<int>() { 1, 2, 3 };
foreach (var item in list.Skip(1))
list.Add(item);Original implementation throws |
|
@ikopylov (and @JonHanna, who also mentioned it in a comment earlier), it does, but we've also been adding such special cases elsewhere in LINQ, in fact @JonHanna I believe you've added some of them 😉. If we're concerned about it in this case, we've got a lot of changes to go back and revert. I believe we agreed this was an acceptable difference. @VSadov? @weshaggard? |
That's great. However |
I'm ok with that, but to be clear, changing the collection is going to potentially result in very strange behavior, even without that. For example, if an item is added to the beginning of the list between two iterations, this change will likely result in the same item being returned twice. |
|
I'm pretty sure I've avoided that particular type of change (I might be wrong), but in any case I think this observable change is worth:
Step 2 of course entails I'm open to persuasion on step 3 :) Still, I never did agree with that I do agree that Most of all I wonder about the performance. The constant cost of the test I think is negligible (now there's something I certainly have added several times myself), but the impact of the call to |
Hmm, ok, we'll need to go back and look. I thought we'd already discussed such a change in the context of other ones being made previously. If that's not the case, then we need to put the brakes on this change and address that issue before moving forward with this. |
|
About throwing IOE when collection changes while iterating. IMO this is a fairly useless behavior and I have not seen a single case of code depending on this. In particular, the behavior composes very unreliably. Example: trivial concatenation of two lists via Linq will result in something that will guard against modifications, but only when you are enumerating the matching half ... |
I would. I'd open a particularly nice bottle and toast whoever made that change that I've been wanting to see for over a decade ;) |
|
What about This means that iterating |
I think that's because it's a boneheaded exception: no production code will rely on it, but it is still useful, because it tells developers when they made an error. Maybe it would make sense to find some other, more reliable, way to tell developers about that error, like a Roslyn analyzer? |
|
We could special case Separately, though, it seems unfortunate if |
|
@JonHanna Benchmark results show this is a clear winner for arrays and lists regardless of skip length. Other optimizations that could be made: ToArray/ToList speedup (not sure how often Skip is piped to those, I can't say I've ever done it, so I left it out), Skip(<=0) simply returning the passed in collection unchanged (again can't say I've ever done this). |
|
@stephentoub said:
That is in fact the case. ImmutableList is internally a binary tree, so indexing into it is |
I also got reassuring results trying the following, which isn't extensive but does hit a case deliberately engineered to be a case that most hits an expected worse-case: [Fact]
public void QuickSpeedTest()
{
IEnumerable<object> source = new string[100000];
var sw = Stopwatch.StartNew();
for(int i = 0; i != 10000; ++i)
foreach(var item in source.Skip(1))
{
}
sw.Stop();
Assert.Equal(0, sw.ElapsedMilliseconds); // Just want this output
}It came in at a very slight gain even with the unsafe memoising of I'm satisfied about the case I was most worried about.
ImmutableList<TSource> immute = source as ImmutableList<TSource>
if (immute != null)
{
int newLen = immute.Count - count;
if (newLen <= 0) return Enumerable.Empty<TSource>();
return immute.GetRange(count, newLen);
}But apart from this needing to create a dependency on System.Collections.Immutable, I think the case where
I've certainly piped
I'd say its relatively common, when the value passed to In all, this one commit ago (before the caching of |
Yes, the current tests are a mixture of some I created, which mostly use |
This reverts commit c9953f4.
|
|
|
All the tests that hit the old iterator should ideally be doubled up so that there is a version for both guaranteed list source and guaranteed non-list source. |
|
LGTM |
Optimize Enumerable.Skip() for IList<> parameter
dotnet#4551 introduced optimised versions of Skip for IList<T> sources. Have all tests for Skip test both this and the previous path.
dotnet#4551 introduced optimised versions of Skip for IList<T> sources. Have all tests for Skip test both this and the previous path.
Optimisation of Skip() for IList sources from dotnet#4551 fits with optimisations of Skip() and Take() for other sources from dotnet#2401. Combine the approaches, extending how the result of Skip() on a list handles subsequent operations.
Optimisation of Skip() for IList sources from dotnet#4551 fits with optimisations of Skip() and Take() for other sources from dotnet#2401. Combine the approaches, extending how the result of Skip() on a list handles subsequent operations.
Optimisation of Skip() for IList sources from dotnet#4551 fits with optimisations of Skip() and Take() for other sources from dotnet#2401. Combine the approaches, extending how the result of Skip() on a list handles subsequent operations.
Optimisation of Skip() for IList sources from dotnet#4551 fits with optimisations of Skip() and Take() for other sources from dotnet#2401. Combine the approaches, extending how the result of Skip() on a list handles subsequent operations.
Anything that can serve as one can serve as the other, and also provide a faster path for Count(). Merge the two interfaces and add a Count property. Have IList optimised result of Skip() partitionable. Optimisation of Skip() for IList sources from dotnet#4551 fits with optimisations of Skip() and Take() for other sources from dotnet#2401. Combine the approaches, extending how the result of Skip() on a list handles subsequent operations.
Anything that can serve as one can serve as the other, and also provide a faster path for Count(). Merge the two interfaces and add a Count property. Have IList optimised result of Skip() partitionable. Optimisation of Skip() for IList sources from dotnet#4551 fits with optimisations of Skip() and Take() for other sources from dotnet#2401. Combine the approaches, extending how the result of Skip() on a list handles subsequent operations.
Anything that can serve as one can serve as the other, and also provide a faster path for Count(). Merge the two interfaces and add a Count property. Have IList optimised result of Skip() partitionable. Optimisation of Skip() for IList sources from dotnet#4551 fits with optimisations of Skip() and Take() for other sources from dotnet#2401. Combine the approaches, extending how the result of Skip() on a list handles subsequent operations.
Anything that can serve as one can serve as the other, and also provide a faster path for Count(). Merge the two interfaces and add a Count property. Have IList optimised result of Skip() partitionable. Optimisation of Skip() for IList sources from dotnet#4551 fits with optimisations of Skip() and Take() for other sources from dotnet#2401. Combine the approaches, extending how the result of Skip() on a list handles subsequent operations.
Anything that can serve as one can serve as the other, and also provide a faster path for Count(). Merge the two interfaces and add a Count property. Have IList optimised result of Skip() partitionable. Optimisation of Skip() for IList sources from dotnet#4551 fits with optimisations of Skip() and Take() for other sources from dotnet#2401. Combine the approaches, extending how the result of Skip() on a list handles subsequent operations.
Anything that can serve as one can serve as the other, and also provide a faster path for Count(). Merge the two interfaces and add a Count property. Have IList optimised result of Skip() partitionable. Optimisation of Skip() for IList sources from dotnet#4551 fits with optimisations of Skip() and Take() for other sources from dotnet#2401. Combine the approaches, extending how the result of Skip() on a list handles subsequent operations.
Anything that can serve as one can serve as the other, and also provide a faster path for Count(). Merge the two interfaces and add a Count property. Have IList optimised result of Skip() partitionable. Optimisation of Skip() for IList sources from dotnet#4551 fits with optimisations of Skip() and Take() for other sources from dotnet#2401. Combine the approaches, extending how the result of Skip() on a list handles subsequent operations.
Anything that can serve as one can serve as the other, and also provide a faster path for Count(). Merge the two interfaces and add a Count property. Have IList optimised result of Skip() partitionable. Optimisation of Skip() for IList sources from dotnet#4551 fits with optimisations of Skip() and Take() for other sources from dotnet#2401. Combine the approaches, extending how the result of Skip() on a list handles subsequent operations.
Anything that can serve as one can serve as the other, and also provide a faster path for Count(). Merge the two interfaces and add a Count property. Have IList optimised result of Skip() partitionable. Optimisation of Skip() for IList sources from dotnet#4551 fits with optimisations of Skip() and Take() for other sources from dotnet#2401. Combine the approaches, extending how the result of Skip() on a list handles subsequent operations.
Anything that can serve as one can serve as the other, and also provide a faster path for Count(). Merge the two interfaces and add a Count property. Have IList optimised result of Skip() partitionable. Optimisation of Skip() for IList sources from dotnet#4551 fits with optimisations of Skip() and Take() for other sources from dotnet#2401. Combine the approaches, extending how the result of Skip() on a list handles subsequent operations.
Anything that can serve as one can serve as the other, and also provide a faster path for Count(). Merge the two interfaces and add a Count property. Have IList optimised result of Skip() partitionable. Optimisation of Skip() for IList sources from dotnet#4551 fits with optimisations of Skip() and Take() for other sources from dotnet#2401. Combine the approaches, extending how the result of Skip() on a list handles subsequent operations.
Anything that can serve as one can serve as the other, and also provide a faster path for Count(). Merge the two interfaces and add a Count property. Have IList optimised result of Skip() partitionable. Optimisation of Skip() for IList sources from dotnet/corefx#4551 fits with optimisations of Skip() and Take() for other sources from dotnet/corefx#2401. Combine the approaches, extending how the result of Skip() on a list handles subsequent operations. Commit migrated from dotnet/corefx@a087c2d

Changes Enumerable.Skip() from O(n) to O(1) when an IList<> is passed.