-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Optimizations on System.Linq.Enumerable.ToArray #550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
...and some memory usage...
|
src/libraries/Common/src/System/Collections/Generic/ToArray.SpeedOpt.cs
Outdated
Show resolved
Hide resolved
Co-Authored-By: Leandro A. F. Pereira <leandro@hardinfo.org>
|
Thanks, @manofstick. I've not looked at the changes yet, but I am concerned by:
This is effectively saying the code is doing |
Yes, but worse (due to other overheads)! :-)
Given that we have a 1Mb stack, if were from the root then that would be structs of ~16K. Pretty big. OK, can't expect that we'd be from the root, so 1/2 Mb - already a hugely deep call stack, then we ~8 K? I think that's a stretch too... But obviosuly you have to be conservative in a library. I understand, I'm not trying to argue too much :-), and obviously you're call is the law, but is there any size that we can consider as reasonable? |
|
OK, how's this for an idea... First of all I can do (array|list).Where.ToArray() without storing T, as I can just store indexes. So these should work all fine. Second I can do quick value type/ref type check (i.e Third for other types I just go through my recursive store only once - i.e. storing only the first 4 values. Now this is safe (ish) as the largest value type that can be used in an array is 128k (the Type Loader throws an exception for bigger sizes from my testing....), so at most I consume - in the exceedingly unlikely case - only half the stack. But really I can't see anyone realistically created Value Types anywhere near this, and if they did, they are likely blowing the stack elsewhere, or have increased it's size anyway... Anyway, if you still think this is all just crazy talk, do feel free to shoot me down. I've got a pretty thick skin :-) |
|
Thanks. I want to take an in-depth look at it, I just haven't had the time yet. I will :) |
|
@manofstick, apologies for the long delay in getting back to this. I spent some time looking at the change. If I can summarize what you're trying to do here, it's to cater to small ToArray inputs by unrolling the loop and using stack allocation for the temporary values before they're stored into the small result array, yes? It's an interesting approach. I just don't think it's going to move the needle sufficiently to warrant its weight (it also drops some existing optimizations for things like e.g. Enumerable.Range(...).ToArray), especially once the recursion / stackoverflow concerns are addressed and the limit drops to four entries as you suggested. Using an approach like @jkotas did in #32365 (comment), I took a look at the overhead this incurs just in generic instantations, and for a simple case, it was increasing working set by ~2.5K per generic instantiation; even for the "speed optimized" build, that's hefty for such a commonly-used method as ToArray. FWIW, I actually think we should be looking to go in the other direction and simplify the implementation of ToArray we already have. We accepted the contributions to get it to where it is, but all of the builder stuff that's trying to improve memory usage and throughput is adding a lot of complexity and also itself significantly increasing working set. I just tried replacing the three lines at runtime/src/libraries/Common/src/System/Collections/Generic/EnumerableHelpers.Linq.cs Lines 110 to 112 in 0bf721a
I'll defer to @eiriktsarpalis and @adamsitnik, but I think we should close this PR, but keep some of your ideas in mind as we look at ways to simplify what's currently there; maybe we can have our cake and eat it, too. At the moment, though, I don't think this yields the right trade-offs. Thanks! |
|
Thanks for taking the time to peruse. This one always was a bit of a long-shot, but I'm glad I threw it out there and hopefully it will seed some ideas for you. I'll close it off. |
3rd of n optimizations lifted from my exploration of an alternative
System.LinqimplementationCistern.Linq. There are lots more goodies there, but only so many that will survive the transfer toSystem.Linq!(Previous corefx optimization)
This is a bit more controversial, which although I think it is reasonable, may get kicked to the curb depending on you POV. So why do I think this is is controversial? Well it uses a Clayton's
stackalloc(i.e. recursion) to store the first 64 elements (sounded like a reasonable number, but obviously up for discussion) to avoid temporary allocation buffers (can't use a "real"stackallocdue to not meeting the generic type constraints).It then finishes off the process using recursion too. What this means is that you have got some deeper stack accesses than previously, but I feel that they are not unreasonable. So how much stack are we using?
So the maximum function depth is 16 + 25 = 41
max array size ~ 2^31
approx stack usage = 16*(4sizeof(T)+sizeof(T[])+sizeof(IEnumerable<>)+sizeof(int)+sizeof(Func<>)+sizeof(int)+sizeof(this)+function overhead) + (log2(2^31/64))(sizeof(object)+sizeof(ref int)+sizeof(Func<>)+sizeof(count)+function overhead).
which, for a sizeof(T) = 8 is ~ 2K
Anyway, so if that is unreasonable, then you can stop here and kill this PR. Otherwise, read on...
So I added some ToArray performance tests, but haven't pushed them to the Performance repository (as they are a bit excessive for what they are doing there), but you can view them here.
The tests are modifying 3 axes... Length, the type of predicate used in
Whereand the source type (i.e. a generator function, an array or a list). What you are not seeing here is the smaller amount of allocation that are occuring due to avoiding temporary buffers for small arrays.Anyway, I'll add some more details - some memory usage, etc. (and I realised that I haven't actually included the case of the Generator without a
Where) if this is actually going to be considered to be pulled intomaster(oh, and the plan would also be to extend this to WhereSelect as well, but TBD...)