[Linq] Lower LargeArrayBuilder's ResizeLimit#14738
Conversation
|
I wonder if a number of GC is a meaningful metric of performance. It seems that overall GC number would depend on size of machine, GC implememtation/mode or its learning heuristics. Some collections could be cheaper than others. Bigger is not necessary worse. Cpu time or bytes allocated would be more meaningful. |
|
@VSadov, @karelz Sorry for the delayed reply. I will try to measure other stats for this when I have the time. It will be tricky to measure CPU time because there are so many virtual method calls here that they completely blot out the overhead of extra allocations, even for larger enumerables. As for bytes allocated the only way I am aware of how to do that is this new API, so I will have to adjust my benchmark app later to build against the corefx myget feed which has that API. |
If you don't expect CPU time to show measurable difference, then focus on a metric which shows it -- allocated bytes in this case.
Did you try to use PerfView? It has great video tutorials. cc: @brianrob @vancem @valenis for additional perf guidance on allocated bytes measurements |
|
@jamesqo, to do an ad-hoc performance analysis with PerfView is very easy. There are Videos https://channel9.msdn.com/Series/PerfView-Tutorial and very first one shows you that you are just a couple clicks away from having the tool. From there if you run PerfView collect While your benchmark runs it will collect both CPU stacks as well as samples of stacks of GC allocations every 100K of GC bytes allocated. Thus you can quickly get a read of how much allocation happened over the whole benchmark (or any sub-part of it). Running this with and without your change will show you the effect you have had on both allocation as well as CPU time used. |
|
@vancem Thank you for the advice, very useful. I actually have used PerfView before for measuring CPU time, but wasn't aware that it could be used to measure allocated bytes as well. @VSadov I have just made a comparison between setting the resize limit to 8 and setting it to 32 with the same gist I posted above using PerfView. 32 elements: 8 elements: There is ~6MB less of allocated bytes (out of 26MB, so roughly 20%) when changing the limit to 8. Note that is this somewhat offsetted because we are also allocating additional |
|
@VSadov Maybe you could merge this when you have time? I have posted perf results. |
|
LGTM. @VSadov? |
|
@VSadov can we merge? We got 2 reviews ... |
|
LGTM |
[Linq] Lower LargeArrayBuilder's ResizeLimit Commit migrated from dotnet/corefx@64bd0a4
Background:
LargeArrayBuilderhas a "resize limit" after which it stops resizing its buffer and switches to using chunked arrays. I initially chose this limit to be 32 elements, because I didn't want it to be too small in casesizeof(T)was small, but also didn't want it to be too large in casesizeof(T)was large.Description: Since writing #14020, I've come to realize that the 90% use case for Linq is with business objects / classes, and not with low-level stuff like bytes. Since reference types are 4/8 bytes wide, it makes sense to optimize for
Tbeing wider. This PR lowers the resize limit ofLargeArrayBuilderfrom 32 -> 8 elements.Performance improvements: gist For reference types, there is a decrease of ~5 GCs for sizes 9-15, and ~10 GCs from 17-31 and 33+. Sizes 16 and 32 regress, which is expected since the array is now chunked at those sizes and the builder can't directly return it.
cc @stephentoub, @JonHanna, @VSadov