Add LargeArrayBuilder type to enscapulate ToArray logic by jamesqo · Pull Request #13076 · dotnet/corefx

jamesqo · 2016-10-27T00:28:33Z

The EnumerableHelpers.ToArray logic I introduced in #11208, while better in terms of memory consumption, employs a very imperative code style. I refactored the core logic into a new struct called LargeArrayBuilder so now the method looks like

// Check for ICollection

var builder = new LargeArrayBuilder(initialize: true);
builder.AddRange(source);
return builder.ToArray();

As a result of this refactoring, we can use the buffering strategy employed by ToArray in other places where we were previously using EnumerableHelpers.ToArray(this) (such as Select.ToArray or Concat.ToArray) and simultaneously avoid some unnecessary indirection/field stores. This results in substantial speedup.

Perf testing and analysis: test code, data.

The following patterns all see substantial speedups (where e is a lazy enumerable):

e.Concat.ToArray
e.Append.ToArray
e.Append.Append.ToArray
e.Select.ToArray
e.OrderBy.Select.ToArray (OrderBy returns an IPartition that sometimes can't get its count cheaply)

Once this is merged I will change #12703 to make Where use it too, so all Where.ToArray calls should be substantially faster.

cc @stephentoub @jkotas @VSadov @JonHanna

jamesqo · 2016-10-27T00:33:35Z

I found out this wasn't getting inlined, so I tagged it with AggressiveInlining. However, it turns out that doing so generates ~50 more bytes of code for each call site to Add, and the effect is amplified since we're using the builder in a generic context.

So my solution was to add a SlowAdd method wrapper around this that was tagged with NoInlining. Add is used in places where it is part of the bottleneck (such as in loops), otherwise SlowAdd is used to minimize code bloat.

I found out this wasn't getting inlined, so I tagged it with AggressiveInlining

Why was it important that it get inlined? Do you have a benchmark that shows a measurable improvement in key scenarios from that happening? Otherwise I'd rather let the JIT do its thing and not complicate the API here with two Add methods that are identical other than in inlining.
cc: @AndyAyersMS

@stephentoub, will post when I get a chance.

Yes, removing the AggressiveInlining indeed does result in a perf regression. I wrote this app to measure the throughput with/without the AggressiveInlining; removing the attribute results in a significant (~30%) slowdown.

removing the attribute results in a significant (~30%) slowdow

You're comparing release builds, and the only difference between them is whether Add has AggressiveInlining on it? I just tried with your PR (with and without that one line with AggressiveInlining), and while I do see an improvement that comes from the PR in general, I don't see any such difference between whether Add has AggressiveInlining on it.

Yes. What test app are you using to benchmark?

Same as you. I've tried a few other things, though, and I am able to construct a simple console app that sees about a ~15% improvement from the AggressiveInlining, so while not 30%, it does seem worthwhile to keep it for now.

@stephentoub, good. My percentages may have been skewed a bit since I was comparing the smaller times in relation to the larger times instead of vice versa (1100 is a 20% regression from 900, 900 is an 18% improvement from 1100) so I will have to be careful about that in the future.

Sorry, @jamesqo just saw this now.... did you ever drill deeper into the codegen with and without forced inlining?

@AndyAyersMS Sorry, I forgot about your comment until now. I do have a gist from around the time I made this PR: https://gist.github.com/jamesqo/e46cdf95bd416220c48e16583fd91f6c The method being jitted is AppendPrepend1Iterator.LazyToArray, which you can find in the diffs.

NoInline.asm is when AggressiveInlining is not applied and the code looks like

if (!_appending) { builder.Add(_item); } builder.AddRange(_source); if (_appending) { builder.Add(_item); }

WithInline.asm is that same code sample, except when AgressiveInlining is applied to Add

WithInline2.asm is when the code is instead written like

if (_appending) { builder.AddRange(_source); } builder.Add(_item); if (!_appending) { builder.AddRange(_source); }

That's all the investigation I did.

Ok, thanks. Not obvious from just this where the win came from. I'd need to go back and look at what Add compiles to when it's not inlined.

jamesqo · 2016-10-27T14:08:49Z

self-note: Remove this variable

karelz · 2016-10-29T17:16:08Z

cc: @Priya91 @ianhays

jamesqo · 2016-10-30T19:33:05Z

@karelz, FYI I'd say this PR is more related to System.Linq than System.Collections. Would be nice if you could relabel / change assignees.

karelz · 2016-10-31T20:11:47Z

I see the code changes in both -- Collections and Linq. I will leave area owners/experts to decide who is in best position to do the code review.

jamesqo · 2016-11-01T16:00:51Z

TODO: It may be worth eliminating this and just writing the whole thing inline in AddRange(IEnumerable). The one place where it's used can be eliminated by adding some extra methods to SingleLinkedNode.

jamesqo · 2016-11-01T16:00:51Z

TODO: It may be worth eliminating this and just writing the whole thing inline in AddRange(IEnumerable). The one place where it's used can be eliminated by adding some extra methods to SingleLinkedNode.

…ogic

jamesqo · 2016-11-02T23:32:30Z

@stephentoub Do you think it would be possible for you to review this when you have time? Thanks.

karelz · 2016-11-02T23:41:06Z

@ianhays @Priya91 @VSadov can you please review the code?

karelz · 2016-11-06T21:24:12Z

@ianhays @Priya91 @VSadov @OmarTawfik ping? Who's the best person to review the change?

stephentoub · 2016-11-07T12:45:21Z

Why did you rename this from GenerateEnumerable to GenerateSequence? Seems like it was fine named what it was, maybe even better as Enumerable.

@stephentoub, can rename back.

stephentoub · 2016-11-07T12:48:31Z

Would this not be better / simpler as:

if (_appending) { builder.AddRange(_source); builder.SlowAdd(_item); } else { builder.SlowAdd(_item); builder.AddRange(_source); }

?

@stephentoub, this is the same pattern that is used in other places throughout the file, e.g. here.

Ok. For consistency it's worth keeping it like this for now, but I think it'd also be worth following up to see whether it'd be worth changing all of them to what, at least to me, looks like a simpler and more understandable pattern. I don't know what the JIT'd code looks like it, but it's possible there's also a perf benefit to one or the other, either in number of actual branches or in IL size or something like that.

@stephentoub

I don't know what the JIT'd code looks like it, but it's possible there's also a perf benefit to one or the other,

Note that _appending is readonly, so the JIT should be able to cache the field access in a register there and save some code size. I haven't checked, though.

stephentoub · 2016-11-07T12:50:18Z

Why not keep the cast as it was?

return ((IEnumerale<TSource>)ToArray()).GetEnumerator();

I personally like AsEnumerable better, but sure.

stephentoub · 2016-11-07T12:51:14Z

Why separate this out into its own (public) method? Where else is it being used?

stephentoub · 2016-11-07T12:52:28Z

Why are you creating this intermediate array? Don't we know how long _appended is from the count stored in each node? Can't we just iterate through the list, adding each node's item to the array from the last to the first, as is done for _appended elsewhere?

@stephentoub We cannot presize the array here since _source is lazy. Since we can't presize the array, we don't know if there will be enough room to fit all of the appended items-- we might have to do another allocation in the builder.

Was such an array getting allocated before? I'm struggling to see this as a pure win if we're introducing additional intermediate ToArrays like this. If it was already there before, then ok, though it would be nice subsequently to find a way to avoid it.

@stephentoub Yes. The logic in this method is basically a copy verbatim of what MoveNext currently does.

Yes. The logic in this method is basically a copy verbatim of what MoveNext currently does.

Ugh. Ok. Thanks.

stephentoub · 2016-11-07T13:05:23Z

Why separate this out? The other cases in ToArray aren't in their own methods, seems like this should just be inline with the others, which then also removes the need for the assert.

@stephentoub, ToArray is generic so it can be regenerated by the JIT. The logic in LazyToArray seemed big enough that it seemed worth separating out into a new method, to avoid penalizing code that doesn't go down this branch. I'll post the generated code size of this method when I get time.

Just checked. LazyToArray comes in at an extra 306 bytes of code: https://gist.github.com/jamesqo/6b5f2cc27ca36fd3f488c7be57a87cb4 It would probably be wise to avoid generating all that unnecessary code if it's never called.

It would probably be wise to avoid generating all that unnecessary code if it's never called.

Why is that the case for "case -1" but not the case for "default" in ToArray? My point is that we should be consistent here.

@stephentoub, I separated the default: case into a new PreallocatingToArray method.

stephentoub · 2016-11-07T13:16:04Z

What does the comment "Is _first <= ResizeLimit." mean?

I meant to say, "If _count <= ResizeLimit then _first == _current." I'll rephrase that.

stephentoub · 2016-11-07T13:17:37Z

Instead of the : this(), how about just explicitly initializing the remaining fields (_buffers, _index, _count)?

@stephentoub, is explicit initialization faster than using the default struct constructor?

I just checked. If we initialize members explicitly then the constructor doesn't get inlined:

G_M36021_IG02: 488D4DC8 lea rcx, bword ptr [rbp-38H] BA01000000 mov edx, 1 E88CFAFFFF call LargeArrayBuilder`1:.ctor(bool):this

While this isn't the bottleneck, I think using : this() looks more readable anyway.

I think using : this() looks more readable anyway.

I personally disagree. To me this looks like we're initializing some of the fields multiple times, as delegating to "this()" needs to ensure that all of the fields are initialized, and then we subsequently overwrite some of them (it looks to me the same as having a bunch of fields in a class that are explicitly initialized to default(T) at the field declaration site and then subsequently overwritten in an explicit ctor). We also don't use this pattern elsewhere in corefx, which makes this stand out to me like a sore thumb.

If there's actually an inlining difference and it matters, that's something that should be addressed in the JIT.
cc: @AndyAyersMS

@stephentoub

it looks to me the same as having a bunch of fields in a class that are explicitly initialized to default(T) at the field declaration site and then subsequently overwritten in an explicit ctor

If you think about it, that can be faster since the zeroing can be done in bulk vs. initializing fields 1 at a time. For example, on my machine the jit generates code to use ymm registers:

C4E17957C0 vxorpd ymm0, ymm0 C4E17A7F02 vmovdqu qword ptr [rdx], ymm0 C4E17A7F4210 vmovdqu qword ptr [rdx+16], ymm0

If there's actually an inlining difference and it matters, that's something that should be addressed in the JIT.

Considering things from the viewpoint of the JIT, isn't the constructor just a bunch of arbitrary code? So then when it looks at the constructor and sees that it has a huge body, it might not decide to inline even though all of the code is just default-initializing the struct. It would seem that if the JIT were to try to inline this, it would waste time for other cases where that arbitrary code isn't all just default-initialization.

I'm not really familiar with how the JIT works, but just a guess... cc @mikedn, @benaadams

If you think about it, that can be faster since the zeroing can be done in bulk vs. initializing fields 1 at a time

I'm not talking about what actually happens in the implementation; you were talking about readability, and part of readability is being able to reason about what's happening, so I was sharing what immediately comes to my mind when I see that code, regardless of what the compiler does, and hence why for me it harms readability, along with the fact that readability is helped by seeing common patterns and harmed by seeing jarring patterns, and to me, this use of ": this()" in a struct is jarring.

A compiler can choose to optimize certain patterns, so the compiler could choose to generate the exact same code for:

public SomeStruct() { _field1 = 0; _field2 = null; _field3 = Calculate(); }

as it does for:

public SomeStruct() : this() { _field3 = Calculate(); }

along with the fact that readability is helped by seeing common patterns and harmed by seeing jarring patterns, and to me, this use of ": this()" in a struct is jarring.

OK, I can remove the explicit : this() call then. 👍

A compiler can choose to optimize certain patterns, so the compiler could choose to generate the exact same code for:

Yes. I was trying to say that with the first version the JIT would (probably) have to inspect more bytecode to see that 0 was being assigned to _field1, and null was being assigned to _field2, before making that optimization (which would presumably come at a cost since all of this is done at runtime).

stephentoub · 2016-11-07T13:26:24Z

Why have AllocateBuffer return _current? We don't always use the result... why not just have the call sites that need _current access _current?

stephentoub · 2016-11-07T13:36:25Z

Can this overflow?

Not before we get an OOME.

Not before we get an OOME.

Won't it overflow when destination.Length == 0x40000000?

@stephentoub, good point. In that case, however, AllocateBuffer will attempt to allocate an array of size 0x40000000 * 2, which will raise an OverflowException anyways. Also, if we've reached that point then that would mean we've added 0x80000000 items, meaning ToArray couldn't possibly work. Idk if it's worth adding an extra checked block which has no effect, and protects against a case which is already broken.

however, AllocateBuffer will attempt to allocate an array of size 0x40000000 * 2, which will raise an OverflowException anyways

Will it? I've not tried it, but it looks like it would actually see _count as negative, which is less than ResizeLimit, so it would enter the "we're still less than ResizeLimit" block. Maybe those checks need to be made robust.

Ah, very nice catch. It looks like we will multiply _count by 2 again, and 0x80000000 * 2 is 0, so then we allocate a 0-length array and set it to _first / _current instead of raising an exception. Since it doesn't add any additional cost, I can change the check to be if ((uint)_count < (uint)ResizeLimit) which should get us down the path which throws for overflow.

stephentoub · 2016-11-07T13:45:29Z

Thanks, @jamesqo. I left some comments to be addressed, but overall it looks good.

jamesqo · 2016-11-08T18:09:36Z

@stephentoub, thanks for reviewing. I've responded to all of your comments.

jamesqo · 2016-11-08T23:21:39Z

Test Innerloop OSX Debug Build and Test

jamesqo · 2016-11-08T23:23:37Z

Test Innerloop Linux ARM Emulator SoftFP Debug Cross Build
Test Innerloop Linux ARM Emulator SoftFP Release Cross Build

stephentoub

Thanks, @jamesqo.

dnfclas added the cla-already-signed label Oct 27, 2016

jamesqo commented Oct 27, 2016

View reviewed changes

karelz assigned ianhays, Priya91 and jamesqo Oct 29, 2016

karelz added the area-System.Collections label Oct 29, 2016

jamesqo mentioned this pull request Nov 1, 2016

Optimize Where{.Select}.To{Array,List} & Count. #12703

Merged

jamesqo commented Nov 1, 2016

View reviewed changes

jamesqo added 4 commits November 2, 2016 19:30

Add LargeArrayBuilder type to enscapulate EnumerableHelpers.ToArray l…

69c0582

…ogic

Fix one csproj that wasn't including the file

a47cd87

Remove unneeded variable

c8cffb9

Remove the AddRange overload accepting an IEnumerator

6c76751

jamesqo force-pushed the large-arraybuilder branch from 355474b to 6c76751 Compare November 2, 2016 23:30

stephentoub reviewed Nov 7, 2016

View reviewed changes

Respond to some PR feedback

1272cfe

jamesqo added 3 commits November 8, 2016 14:19

Use explicit field initialization in LargeArrayBuilder ctor.

ab84811

Fix behavior when _count == int.MinValue.

24c3ba4

Add SelectIPartitionIterator.PreallocatingToArray.

0fa27c2

stephentoub approved these changes Nov 9, 2016

View reviewed changes

stephentoub merged commit 8ee5cfd into dotnet:master Nov 9, 2016

jamesqo deleted the large-arraybuilder branch November 9, 2016 11:22

karelz modified the milestone: 1.2.0 Dec 3, 2016

Conversation

jamesqo commented Oct 27, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephentoub Nov 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesqo Nov 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karelz commented Oct 29, 2016

Uh oh!

jamesqo commented Oct 30, 2016

Uh oh!

karelz commented Oct 31, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesqo commented Nov 2, 2016

Uh oh!

karelz commented Nov 2, 2016

Uh oh!

karelz commented Nov 6, 2016

Uh oh!

stephentoub Nov 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesqo Nov 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephentoub Nov 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesqo Nov 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesqo commented Oct 27, 2016 •

edited

Loading

stephentoub Nov 7, 2016 •

edited

Loading

jamesqo Nov 24, 2016 •

edited

Loading

stephentoub Nov 7, 2016 •

edited

Loading

jamesqo Nov 8, 2016 •

edited

Loading

stephentoub Nov 7, 2016 •

edited

Loading

jamesqo Nov 7, 2016 •

edited

Loading

stephentoub Nov 7, 2016 •

edited

Loading

jamesqo Nov 7, 2016 •

edited

Loading