Skip to content

Run the selector on items during Select.Count chains. #19455

@jamesqo

Description

@jamesqo

Per @VSadov's comment at dotnet/corefx#12703 (comment):

Re: can we skip selectors/predicates in this optimization

It is a tough one.

Generally, assuming that anything can have sideeffects is indeed very
constraining to what kind of optimizations we can do. In some cases it
is hard to specify how much of the user code runs and in what order.
We definitely reserve the rights to substitute iteration for indexing
when we find that possible.

After thinking about this for quite a while, I think Count is a kind
of aggregating operator. Even though the actual results are not
collected, it seems reasonable for the user to expect that they would
be computed, and as such he may expect to observe the sideeffects from
selectors/predicates etc.

I.E. – I could see someone using a selector that writes to the
console, and use Count as a way to run the query for sideeffects only.

I think we should not take the change and check for other changes like
this, that we might have accepted in the past. (like: dotnet/corefx#11841 )

Here is what I think is the root cause - We do have a method for
obtaining counts internally -

GetCount(bool onlyIfCheap)

I think the method should not be used to bypass selectors when
actually running the query. It is ok to use it to preallocate internal
buffers. That still have an assumption that the “cheap” way of
obtaining the underlying “.Count” is idempotent, but it is a kind of
assumption that we agreed in the past to be acceptable.

It does not seem to be acceptable to assume that user-supplied
selectors/predicates are sideeffects-free in a context of aggregating
query.

I.E. –

it would be ok to preallocate a buffer based on GetCount(bool
onlyIfCheap) it would be ok to compute actual Count via , but only as
long as there are no funcs to run if we have selectors/predicates or
other funcs, they must run in aggregating queries.

So in other words, we should undo all the GetCount optimizations that skip running the selector if onlyIfCheap if false. But if it's true, we're probably using it to preallocate a buffer of some sort, & are probably going to use it anyways. So in those scenarios it should be ok to not run the selector.

cc @JonHanna

Metadata

Metadata

Assignees

No one assigned

    Labels

    api-needs-workAPI needs work before it is approved, it is NOT ready for implementationarea-System.Linqdesign-discussionOngoing discussion about design without consensushelp wanted[up-for-grabs] Good issue for external contributors

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions