Optimize Where{.Select}.To{Array,List} & Count. by jamesqo · Pull Request #12703 · dotnet/corefx

jamesqo · 2016-10-16T20:24:05Z

Changes:

Refactor Count such that the part that checks for internal Linq interfaces is segregated from the part that checks for the rest of the interfaces.
- The internal Count method is probably only going to be used by Linq, so I put this in a new file & made EnumerableHelpers partial, so other assemblies using that class don't have to drag in extra IL.
- Put a chunk of Count in a non-generic method, so we save generating a substantial chunk of code for every different generic instantation of the method.
Implement IIListProvider on all of the Where / Where.Select iterators, substantially speeding up all ToArray / ToList operations by avoiding virtual calls / field stores.
Remove an int field from some Where iterators by reusing _state - 1 as the index.

jamesqo · 2016-10-16T20:32:31Z

Note: Typecast order has changed slightly, we now check IIListProvider<T> first instead of ICollection<T>. Not sure if that matters.

karelz · 2016-11-01T15:50:59Z

@jamesqo what is left? Just code review? cc: @VSadov

jamesqo · 2016-11-01T15:58:51Z

@karelz It would be preferable if #13076 could be reviewed and merged first, otherwise I'd have to make another PR to change some of the logic here again.

karelz · 2016-11-09T16:07:51Z

The dependency is merged, can we move forward on this one @jamesqo?

jamesqo · 2016-11-09T16:55:20Z

@karelz, yes, we can. I just broke my main laptop yesterday, however, so it may be a day or two before I get to finishing this up.

jamesqo · 2016-11-12T06:31:40Z

Alright, finally finished working on this. @stephentoub, @VSadov please review.

jamesqo · 2016-11-12T06:44:36Z

Perf results: here. Count has (unsurprisingly) regressed a little since we're adding 3 new method calls, ToArray and ToList have both improved significantly.

jamesqo · 2016-11-12T14:49:49Z

cc @JonHanna

stephentoub · 2016-11-14T20:18:22Z

What is the value of this change?

stephentoub · 2016-11-14T20:24:19Z

You've been doing this kind of separation in a few places. Is it really worthwhile? This adds cost to the (common) case of using Count() on a regular 'ol enumerable. I don't have numbers, but I'd venture to guess this is the most common case, yet we've made it the most expensive now.

@stephentoub I did so because I figured 1 or 2 extra procedure calls wouldn't be much compared to the overhead of repeatedly calling MoveNext, a bunch of typecasts beforehand, GetEnumerator / Dispose, etc. I did a small benchmark here testing for regression; surprisingly, the difference does not seem to be measurable even for the length 1 case.

but I'd venture to guess this is the most common case,

We only go down this path if the enumerable is lazy (we can't determine its size beforehand) & does not come from/is not optimized to implement IIListProvider by Linq, e.g. a method that uses yield return. Is this the most common case?

@stephentoub It should also be noted that where we detract we add: Separating this part out into a new method saves the JIT from generating a sizeable chunk of code for each generic instantiation of Count. I can post some numbers if you'd like.

We only go down this path if the enumerable is lazy

This is extremely common. If you have a concrete T[], you're much better off calling Length. If you have a List<T> or an ICollection<T> or an IList<T>, etc., you're better off just using its Count property. It's only when you have one of those that's typed as IEnumerable<T> that you'd use Count(), whereas Count() is the only option for enumerables produced by LINQ, by iterators, etc.

Separating this part out into a new method saves the JIT from generating a sizeable chunk of code for each generic instantiation of Count

I understand. But at the same time, we're not going to go through every generic method in the platform and pull out groups of lines here and there that aren't dependent on the generic parameters and separate them into their own methods. That would be overkill. What makes these few lines special?

I can post some numbers if you'd like.

Do you have example libraries or apps in mind, where Count() is used on an IEnumerable<T> with a value type T in cases where this code would not be hit? This only ends up saving on generated assembly if this code is never hit for a given value type T (or any reference type) but the earlier part of the method is. If that's really common, then sure, I can see it making sense... I'm just skeptical.

@stephentoub

This only ends up saving on generated assembly if this code is never hit for a given value type T (or any reference type) but the earlier part of the method is.

Actually, I believe this saves generated assembly in 2 scenarios:

Like you mentioned, if for a particular value type T the collection always implements one of the interfaces and does not hit this codepath, then this will save.

Another scenario is if this codepath is hit twice by two different value types, it will only be generated once, since it's non-generic. Example

lazyBytes.Count<byte>(); lazyInts.Count<int>();

Do you have example libraries or apps in mind,

@stephentoub, I have been trying to measure the impact this has on Roslyn for a couple of days. However, I'm running into some runtime issues using Roslyn with a custom build of coreclr. ~~If you're still not convinced by this argument, I can remove this part of the PR out for now, and make a new PR when I get numbers.~~ Will separate into new PR for now

stephentoub · 2016-11-14T20:26:52Z

Why is the AsEnumerable() necessary? If it's purely to satisfy the compiler's need for the if/else branches to have a common type, I'd much prefer to see a cast used.

stephentoub · 2016-11-14T21:04:46Z

Nit: could just be:

foreach (TSource item in _source)

stephentoub · 2016-11-14T21:05:09Z

stephentoub · 2016-11-14T21:05:30Z

For Lists, using foreach will be slower since the compiler doesn't optimize that.

stephentoub · 2016-11-14T21:05:48Z

stephentoub · 2016-11-14T21:07:02Z

Same nit... I'll stop commenting on these.

stephentoub · 2016-11-14T21:49:20Z

What's the reason for calling Count rather than open coding this, which would avoid the cost of invoking the selector for every element that passes the predicate? Are we concerned about not throwing exceptions where we previously threw them?

@stephentoub, I wasn't sure that GetCount was worth optimizing; e.Count(pred) is typically preferred over e.Where(pred).Count(), and optimizing this function would involve some imperative code (maintaining an int variable we would increment every time a predicate was hit).

Thinking about it more though, someone may pass e.Where(...) to a function that invokes Count() on the enumerable, so I guess writing this inline may be worthwhile.

stephentoub · 2016-11-14T21:50:56Z

Rather than calling EnumerableHelpers.Count, couldn't we just iterate ourselves here, which would avoid the enumerator allocation, the interface calls, etc.? Same for the WhereListIterator case.

stephentoub · 2016-11-20T13:48:34Z

@VSadov, are you ok with the small change here where previously we'd execute the selector function and now we don't? This is good for perf, but it does mean that if the selector would have thrown an exception, now such an exception won't happen.

@stephentoub For reference, a similar change was made in #11841.

I think we must run the selectors. It is reasonable for the user that selectors will run.
In fact, it does not seem very unlikely that user may do ".Count" specifically just to run sideeffecting selectors.

In reply to: 88807476 [](ancestors = 88807476)

We should revisit the change in #11841. I think we may have gone too far.
It seems ok to use GetCount(onlyIfCheap) to preallocate buffers, but skipping selectors in a context of aggregating operator (even if trivial - like Count) now seems potentially breaking.

In reply to: 88977111 [](ancestors = 88977111,88807476)

stephentoub · 2016-11-20T13:49:08Z

Same question:

index = _state++;

?

stephentoub · 2016-11-20T13:50:07Z

@VSadov, same question here.

stephentoub · 2016-11-20T13:50:43Z

@VSadov, same question here.

VSadov · 2016-11-21T20:04:56Z

Re: can we skip selectors/predicates in this optimization

It is a tough one.

Generally, assuming that anything can have sideeffects is indeed very constraining to what kind of optimizations we can do.
In some cases it is hard to specify how much of the user code runs and in what order. We definitely reserve the rights to substitute iteration for indexing when we find that possible.

After thinking about this for quite a while, I think Count is a kind of aggregating operator. Even though the actual results are not collected, it seems reasonable for the user to expect that they would be computed, and as such he may expect to observe the sideeffects from selectors/predicates etc.

I.E. – I could see someone using a selector that writes to the console, and use Count as a way to run the query for sideeffects only.

I think we should not take the change and check for other changes like this, that we might have accepted in the past. (like: #11841 )

Here is what I think is the root cause -
We do have a method for obtaining counts internally -

GetCount(bool onlyIfCheap)

I think the method should not be used to bypass selectors when actually running the query. It is ok to use it to preallocate internal buffers. That still have an assumption that the “cheap” way of obtaining the underlying “.Count” is idempotent, but it is a kind of assumption that we agreed in the past to be acceptable.

It does not seem to be acceptable to assume that user-supplied selectors/predicates are sideeffects-free in a context of aggregating query.

I.E. –

it would be ok to preallocate a buffer based on GetCount(bool onlyIfCheap)
it would be ok to compute actual Count via , but only as long as there are no funcs to run
if we have selectors/predicates or other funcs, they must run in aggregating queries.

jamesqo · 2016-11-23T04:07:28Z

@VSadov, what you are saying seems reasonable. It will regress performance for Count, but not scenarios where we want to preallocate a buffer. I've opened #13910 to track reversal of the optimizations, & I will update this PR later to revert the related changes in this file.

jamesqo · 2016-11-24T00:52:30Z

@stephentoub @VSadov I re-did the EnumerableHelpers.Count refactoring at 70ac0aa which segregates the IIListProvider check from the public interfaces, and have changed the GetCount implementations back to shims:

public int GetCount(bool onlyIfCheap) => onlyIfCheap ? -1 : EnumerableHelpers.GetCount(this);

so the selector is always evaluated. I think this is safe to merge once green.

VSadov · 2016-11-24T21:52:16Z

equivalent.Count() is evaluated on every iteration. Is that intentional?

@VSadov, I figured it probably didn't matter since this is test code. The length of the source that will be fed to the method is only 10.

VSadov · 2016-11-24T21:56:29Z

LGTM
There is one comment on ".Count" being called more than necessary in one of the tests. Not sure if that is intentional.

stephentoub · 2016-11-25T14:20:08Z

I know I previously questioned the value of separating this out. However, I see this helper is being used in places like https://github.com/dotnet/corefx/pull/12703/files#diff-0c9ac9d1102269d53d52c52b5cdaff8bR20. From such places, we know that the ICollection interface casts will fail, so if this were separated out, we could just call to this piece directly and avoid those casts.

@stephentoub I do not think Count() is worth optimizing for if we have to run the selector on everything anyway. I'm not sure it's worth the extra ugliness to have

public int GetCount(bool onlyIfCheap) => onlyIfCheap ? -1 : EnumerableHelpers.CountAndDispose(GetEnumerator());

Seems like more of an implementation detail to me.

I'm not sure it's worth the extra ugliness to have

What extra ugliness? I'm talking about the difference between:

EnumerableHelpers.Count(this);

and:

EnumerableHelpers.CountIterate(this);

or some similar name.

@stephentoub Ok. I was trying to say that if we cared about optimizing this function at all, we could have just written the whole thing inline like I did here, and avoid any typecasting. But since we are running the selector anyway, I don't think it's worth trying to optimize these functions; we should not add any additional complexity trying to do so, even if that is just one additional method.

I think this can be revisited in another PR if performance here turns out to matter.

But since we are running the selector anyway

There isn't a selector in cases like:
https://github.com/dotnet/corefx/pull/12703/files#diff-e6e91d17f21cf11b8d7b5ba1c23c933aR110

I don't think it's worth trying to optimize these functions

Why?

we could have just written the whole thing inline like I did here

Yes, like I suggested at https://github.com/dotnet/corefx/pull/12703/files#r87900623, which got a thumbs up from you but doesn't appear to have been addressed, so I'm unclear what the plan is.

Why?

@stephentoub I usually capture the sequence into an array/list and get its length instead of using Count when I need it, so for me Count isn't that common. But, I've changed my mind since that may be different for other people. I've updated the PR, reverting the last commit.

This reverts commit 70ac0aa.

jamesqo · 2016-11-26T13:39:27Z

@stephentoub OK. Should be good to merge once green 🎉

stephentoub · 2016-11-26T13:47:51Z

Nit: var => ?

stephentoub · 2016-11-26T13:47:57Z

Nit: var => ?

* Optimize Where{.Select}.To{Array,List} & refactor Count. * Add comment * Use a cast vs. AsEnumerable * Use foreach in the array iterators. * Remove caching of readonly fields. * Write GetCount inline. * Revert all changes to Count. * Respond to PR feedback. * Run the selectors during GetCount. * Add back Count changes, this time w/o optimizations. * Revert "Add back Count changes, this time w/o optimizations." This reverts commit dotnet/corefx@70ac0aa. * Respond to nits. Commit migrated from dotnet/corefx@11a2eeb

dnfclas added the cla-already-signed label Oct 16, 2016

jamesqo commented Oct 16, 2016

View reviewed changes

karelz added the area-System.Linq label Oct 17, 2016

karelz assigned VSadov and jamesqo Oct 17, 2016

jamesqo changed the title ~~Improvements for Where, refactor Count~~ [WIP] [no merge] Improvements for Where, refactor Count Oct 26, 2016

jamesqo mentioned this pull request Oct 27, 2016

Add LargeArrayBuilder type to enscapulate ToArray logic #13076

Merged

jamesqo force-pushed the count-where-tolist branch from 1475ee6 to ba39657 Compare November 11, 2016 20:51

karelz assigned OmarTawfik Nov 11, 2016

jamesqo changed the title ~~[WIP] [no merge] Improvements for Where, refactor Count~~ Optimize Where{.Select}.To{Array,List} & refactor Count. Nov 12, 2016

jamesqo force-pushed the count-where-tolist branch from a68a6f5 to 49fecd3 Compare November 12, 2016 06:25