Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Support limiting the buffer size in LargeArrayBuilder.#14020

Merged
stephentoub merged 13 commits intodotnet:masterfrom
jamesqo:limited
Nov 29, 2016
Merged

Support limiting the buffer size in LargeArrayBuilder.#14020
stephentoub merged 13 commits intodotnet:masterfrom
jamesqo:limited

Conversation

@jamesqo
Copy link
Copy Markdown
Contributor

@jamesqo jamesqo commented Nov 26, 2016

Continuation of: #14006. Unfortunately, I have a habit of force pushing my commits, so I wasn't able to reopen the old PR. Anyway #12703 was merged this morning so this is no longer blocked; this needs #13628 to apply some optimizations to Take, but if this is merged beforehand, I can just update that PR with those changes instead (and vice versa).


Description from original PR

There are certain Linq methods, such as Where and Take, where we cannot pinpoint the exact number of elements the iterator will contain, but we can establish an upper bound on how many elements there are. For example, list.Where(p) can have at most list.Count elements, and e.Take(5) can have at most 5 elements.

Currently, when we call ToArray on any of these iterators, the resizing pattern will be to allocate 4, then 8, then 16, etc. size buffers. This can be wasteful because e.g. if the source enumerable in Where contains 100 elements, we allocate space for 128 elements even though those last 28 slots will never be used.

This PR adds support to LargeArrayBuilder<T> to limit how many elements we allocate, so there is no extra space wasted. It should help drive down allocations in Where, Where.Select, & Take when the maximum count is far from the next power of 2.


Performance test / analysis

There are major reductions in GCs (about 25%) when the length of _source is just above a power of 2, and the predicate returns true for every item.

cc @JonHanna @VSadov @stephentoub

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readonly?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the cast to uint for _maxCapacity?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I made an off-by-one error here. I had assumed that if _maxCapacity were int.MaxValue (which it is for the parameterless constructor) that this assert would always return true b/c _count would overflow, hence an unsigned cast would be needed. But that would only be true if it looked like _maxCapacity >= _count. I'll change this back.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same cast question

Copy link
Copy Markdown
Contributor Author

@jamesqo jamesqo Nov 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may need to keep the uint cast here. We could have come here from AddRange, where we set _count to int.MinValue (last buffer is 0x40000000). If that were the case and we removed these casts, the assert will pass even though we've exceeded our limit. It's different from Add, where we don't modify _count before making the assert.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why is the cast needed on _maxCapacity?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are you saying that there is a legitimate case at run-time where this assert could fail? If so, that shouldn't be an assert.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why is the cast needed on _maxCapacity?

Force of habit; I always cast both sides to uint otherwise Roslyn converts both sides to long which can be unintended in some cases. It does not have anything to do with checking if _maxCapacity >= 0.

Also, are you saying that there is a legitimate case at run-time where this assert could fail?

No; the assert will only fail if AddRange causes it to add past _maxCapacity, which I mentioned is prohibited in the XML docs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not:

Math.Min(_count, _maxCapacity - _count)

?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these necessary?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why these casts are necessary. We know that ResizeLimit is an int... it's a const int defined earlier, and it's defined to be positive. We also know from the earlier assert that _count < _maxCapacity, and _maxCapacity has already been asserted to be greater than 0. So if _count is negative, an assert would have already failed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: var => IEnumerable<IEnumerable<T>>

@stephentoub
Copy link
Copy Markdown
Member

Thanks. LGTM.

@stephentoub stephentoub merged commit 1274ab8 into dotnet:master Nov 29, 2016
@jamesqo jamesqo deleted the limited branch November 29, 2016 01:24
@karelz karelz modified the milestone: 1.2.0 Dec 3, 2016
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…#14020)

* Add support for limiting buffer size to LargeArrayBuilder.

* Add XML docs to the new public methods.

* Add tests for the new methods.

* Fix things for really big arrays.

* Remove unused SlowAdd overload.

* Cast some more things to uint

* Make limit a field, rename to _maxCapacity.

* Add some clarifying comments.

* Test renaming.

* Apply the optimizations to Where.

* Respond to PR feedback.

* Remove extraneous test cases.

* Respond to PR feedback.


Commit migrated from dotnet/corefx@1274ab8
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants