Support limiting the buffer size in LargeArrayBuilder.#14020
Support limiting the buffer size in LargeArrayBuilder.#14020stephentoub merged 13 commits intodotnet:masterfrom
Conversation
There was a problem hiding this comment.
Why the cast to uint for _maxCapacity?
There was a problem hiding this comment.
I think I made an off-by-one error here. I had assumed that if _maxCapacity were int.MaxValue (which it is for the parameterless constructor) that this assert would always return true b/c _count would overflow, hence an unsigned cast would be needed. But that would only be true if it looked like _maxCapacity >= _count. I'll change this back.
There was a problem hiding this comment.
I think we may need to keep the uint cast here. We could have come here from AddRange, where we set _count to int.MinValue (last buffer is 0x40000000). If that were the case and we removed these casts, the assert will pass even though we've exceeded our limit. It's different from Add, where we don't modify _count before making the assert.
There was a problem hiding this comment.
But why is the cast needed on _maxCapacity?
There was a problem hiding this comment.
Also, are you saying that there is a legitimate case at run-time where this assert could fail? If so, that shouldn't be an assert.
There was a problem hiding this comment.
But why is the cast needed on _maxCapacity?
Force of habit; I always cast both sides to uint otherwise Roslyn converts both sides to long which can be unintended in some cases. It does not have anything to do with checking if _maxCapacity >= 0.
Also, are you saying that there is a legitimate case at run-time where this assert could fail?
No; the assert will only fail if AddRange causes it to add past _maxCapacity, which I mentioned is prohibited in the XML docs.
There was a problem hiding this comment.
Why not:
Math.Min(_count, _maxCapacity - _count)?
There was a problem hiding this comment.
I don't understand why these casts are necessary. We know that ResizeLimit is an int... it's a const int defined earlier, and it's defined to be positive. We also know from the earlier assert that _count < _maxCapacity, and _maxCapacity has already been asserted to be greater than 0. So if _count is negative, an assert would have already failed.
There was a problem hiding this comment.
Nit: var => IEnumerable<IEnumerable<T>>
|
Thanks. LGTM. |
…#14020) * Add support for limiting buffer size to LargeArrayBuilder. * Add XML docs to the new public methods. * Add tests for the new methods. * Fix things for really big arrays. * Remove unused SlowAdd overload. * Cast some more things to uint * Make limit a field, rename to _maxCapacity. * Add some clarifying comments. * Test renaming. * Apply the optimizations to Where. * Respond to PR feedback. * Remove extraneous test cases. * Respond to PR feedback. Commit migrated from dotnet/corefx@1274ab8
Continuation of: #14006. Unfortunately, I have a habit of force pushing my commits, so I wasn't able to reopen the old PR. Anyway #12703 was merged this morning so this is no longer blocked; this needs #13628 to apply some optimizations to
Take, but if this is merged beforehand, I can just update that PR with those changes instead (and vice versa).Description from original PR
There are certain Linq methods, such as
WhereandTake, where we cannot pinpoint the exact number of elements the iterator will contain, but we can establish an upper bound on how many elements there are. For example,list.Where(p)can have at mostlist.Countelements, ande.Take(5)can have at most 5 elements.Currently, when we call
ToArrayon any of these iterators, the resizing pattern will be to allocate 4, then 8, then 16, etc. size buffers. This can be wasteful because e.g. if the source enumerable inWherecontains 100 elements, we allocate space for 128 elements even though those last 28 slots will never be used.This PR adds support to
LargeArrayBuilder<T>to limit how many elements we allocate, so there is no extra space wasted. It should help drive down allocations inWhere,Where.Select, &Takewhen the maximum count is far from the next power of 2.Performance test / analysis
There are major reductions in GCs (about 25%) when the length of
_sourceis just above a power of 2, and the predicate returns true for every item.cc @JonHanna @VSadov @stephentoub