Improve inling in ImmutableArray<T>.Builder by svick · Pull Request #28177 · dotnet/corefx

svick · 2018-03-18T00:30:35Z

The issue https://github.com/dotnet/corefx/issues/28064 is about a benchmark whose performance is so bad that a new dangerous method was considered to improve that situation. But almost the same effect can be achieved just by ensuring that the Add() method and the indexer setter on ImmutableArray<T>.Builder can be inlined (see https://github.com/dotnet/corefx/issues/28064#issuecomment-373950250 for more details). This PR does that.

I have only verified that extracting the throw is useful for the indexer setter. But the indexer getter and ItemRef are very similar, so I assumed it makes sense for them too.

Performance results using BenchmarkDotNet (source):

Before:

Method	Mean	Error	StdDev	Median
Add	11.053 us	0.4281 us	1.2623 us	11.004 us
Indexer	6.246 us	0.1827 us	0.5242 us	6.063 us

After:

Method	Mean	Error	StdDev	Median
Add	5.538 us	0.1310 us	0.3863 us	5.422 us
Indexer	3.159 us	0.0631 us	0.1138 us	3.188 us

Relevant portions of JIT dumps:

Before:

*************** In fgFindBasicBlocks() for Builder[Int64][System.Int64]:Add(long):this
weight= 10 : state   3 [ ldarg.0 ]
weight= 10 : state   3 [ ldarg.0 ]
weight= 79 : state  40 [ call ]
weight= 28 : state  24 [ ldc.i4.1 ]
weight=-12 : state  76 [ add ]
weight= 79 : state  40 [ call ]
weight= 31 : state 191 [ ldarg.0 -> ldfld ]
weight= 10 : state   3 [ ldarg.0 ]
weight= 31 : state 191 [ ldarg.0 -> ldfld ]
weight= 20 : state 199 [ stloc.0 -> ldloc.0 ]
weight= 28 : state  24 [ ldc.i4.1 ]
weight=-12 : state  76 [ add ]
weight= 31 : state 111 [ stfld ]
weight= 12 : state   7 [ ldloc.0 ]
weight= 16 : state   4 [ ldarg.1 ]
weight= 65 : state 141 [ stelem ]
weight= 19 : state  42 [ ret ]

Inline candidate callsite is in a loop.  Multiplier increased to 3.
calleeNativeSizeEstimate=445
callsiteNativeSizeEstimate=115
benefit multiplier=3
threshold=345
Native estimate for function size exceeds threshold for inlining 44.5 > 34.5 (multiplier = 3)


Inline expansion aborted, inline not profitable
INLINER: during 'fgInline' result 'failed this call site' reason 'unprofitable inline' for 'Bench:Add():this' calling 'Builder[Int64][System.Int64]:Add(long):this'
INLINER: during 'fgInline' result 'failed this call site' reason 'unprofitable inline'

…

INLINER impTokenLookupContextHandle for Builder[Int64][System.Int64]:set_Item(int,long):this is 0x00007FF8B1B089D1.
*************** In fgFindBasicBlocks() for Builder[Int64][System.Int64]:set_Item(int,long):this
weight= 16 : state   4 [ ldarg.1 ]
weight= 10 : state   3 [ ldarg.0 ]
weight= 79 : state  40 [ call ]
weight= 28 : state  50 [ blt.s ]
weight=227 : state 103 [ newobj ]
weight=210 : state 108 [ throw ]
weight= 31 : state 191 [ ldarg.0 -> ldfld ]
weight= 16 : state   4 [ ldarg.1 ]
weight= 35 : state   5 [ ldarg.2 ]
weight= 65 : state 141 [ stelem ]
weight= 19 : state  42 [ ret ]

Inline candidate callsite is in a loop.  Multiplier increased to 3.
calleeNativeSizeEstimate=736
callsiteNativeSizeEstimate=145
benefit multiplier=3
threshold=435
Native estimate for function size exceeds threshold for inlining 73.6 > 43.5 (multiplier = 3)


Inline expansion aborted, inline not profitable
INLINER: during 'fgInline' result 'failed this call site' reason 'unprofitable inline' for 'Bench:Indexer():this' calling 'Builder[Int64][System.Int64]:set_Item(int,long):this'
INLINER: during 'fgInline' result 'failed this call site' reason 'unprofitable inline'

After:

INLINER impTokenLookupContextHandle for Builder[Int64][System.Int64]:Add(long):this is 0x00007FF8B8698A39.
*************** In fgFindBasicBlocks() for Builder[Int64][System.Int64]:Add(long):this
weight= 31 : state 191 [ ldarg.0 -> ldfld ]
weight= 28 : state  24 [ ldc.i4.1 ]
weight=-12 : state  76 [ add ]
weight=  6 : state  11 [ stloc.0 ]
weight= 10 : state   3 [ ldarg.0 ]
weight= 12 : state   7 [ ldloc.0 ]
weight= 79 : state  40 [ call ]
weight= 31 : state 191 [ ldarg.0 -> ldfld ]
weight= 31 : state 191 [ ldarg.0 -> ldfld ]
weight= 16 : state   4 [ ldarg.1 ]
weight= 65 : state 141 [ stelem ]
weight= 10 : state   3 [ ldarg.0 ]
weight= 12 : state   7 [ ldloc.0 ]
weight= 31 : state 111 [ stfld ]
weight= 19 : state  42 [ ret ]

Inline candidate is mostly loads and stores.  Multiplier increased to 3.
Inline candidate callsite is in a loop.  Multiplier increased to 6.
calleeNativeSizeEstimate=369
callsiteNativeSizeEstimate=115
benefit multiplier=6
threshold=690
Native estimate for function size is within threshold for inlining 36.9 <= 69 (multiplier = 6)
…
Successfully inlined Builder[Int64][System.Int64]:Add(long):this (42 IL bytes) (depth 1) [profitable inline]
--------------------------------------------------------------------------------------------

INLINER: during 'fgInline' result 'success' reason 'profitable inline' for 'Bench:Add():this' calling 'Builder[Int64][System.Int64]:Add(long):this'
INLINER: during 'fgInline' result 'success' reason 'profitable inline'

…

INLINER impTokenLookupContextHandle for Builder[Int64][System.Int64]:set_Item(int,long):this is 0x00007FF8B8698A39.
*************** In fgFindBasicBlocks() for Builder[Int64][System.Int64]:set_Item(int,long):this
weight= 16 : state   4 [ ldarg.1 ]
weight= 10 : state   3 [ ldarg.0 ]
weight= 79 : state  40 [ call ]
weight= 28 : state  50 [ blt.s ]
weight= 79 : state  40 [ call ]
weight= 31 : state 191 [ ldarg.0 -> ldfld ]
weight= 16 : state   4 [ ldarg.1 ]
weight= 35 : state   5 [ ldarg.2 ]
weight= 65 : state 141 [ stelem ]
weight= 19 : state  42 [ ret ]

Inline candidate callsite is in a loop.  Multiplier increased to 3.
calleeNativeSizeEstimate=378
callsiteNativeSizeEstimate=145
benefit multiplier=3
threshold=435
Native estimate for function size is within threshold for inlining 37.8 <= 43.5 (multiplier = 3)
…
Successfully inlined Builder[Int64][System.Int64]:set_Item(int,long):this (28 IL bytes) (depth 1) [profitable inline]
--------------------------------------------------------------------------------------------

INLINER: during 'fgInline' result 'success' reason 'profitable inline' for 'Bench:Indexer():this' calling 'Builder[Int64][System.Int64]:set_Item(int,long):this'
INLINER: during 'fgInline' result 'success' reason 'profitable inline'

stephentoub

Nice. Thanks.

stephentoub · 2018-03-18T01:37:28Z

cc: @AArnott, @jaredpar

svick · 2018-03-18T12:11:04Z

@dotnet-bot test Linux x64 Release Build please
@dotnet-bot test OSX x64 Debug Build please

gfoidl · 2018-03-18T14:06:18Z

            {
-                this.EnsureCapacity(this.Count + 1);
-                _elements[_count++] = item;
+                int newCount = _count + 1;


You can optimize Add even further when you split it in a fast- and cold-path. Like

[MethodImpl(MethodImplOptions.AggressiveInlining)] public void Add(T item) { int count = _count; T[] elements = _elements; if ((uint)count < (uint)elements.Length) { elements[count] = item; _count = count + 1; } else { AddWithResize(item); } } // Improve code quality as uncommon path [MethodImpl(MethodImplOptions.NoInlining)] private void AddWithResize(T item) { int newCount = _count + 1; this.EnsureCapacity(newCount); _elements[_count] = item; _count = newCount; }

Note that Add has to be attributed with AggressiveInlining, because -- interestingly though less asm-size -- the JIT won't inline otherwise.
This also safes the bounds-check in the fast-path.

@gfoidl That's a significant increase in complexity. Do you have something that shows that it's actually sufficiently faster to warrant that?

I made a quick benchmark and it seems your code is actually slower (SimpleAdd is the original code, TweakedAdd is code with my change, SplitAdd is your change):

Method Mean Error StdDev

SimpleAdd 7.935 us 0.0070 us 0.0055 us

TweakedAdd 3.274 us 0.0547 us 0.0512 us

SplitAdd 3.419 us 0.0322 us 0.0302 us

Do you have something that shows that it's actually sufficiently faster to warrant that?

List.Add, Stack.Push and similar classes use this pattern and there it is faster. My suggestion is based on that experience.

When I run your benchmark the results look a bit different.

Linux:

BenchmarkDotNet=v0.10.11, OS=ubuntu 16.04 Processor=Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), ProcessorCount=4 .NET Core SDK=2.1.300-preview1-008174 [Host] : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT DefaultJob : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT

Method Mean Error StdDev Median

SimpleAdd 4.823 us 0.0891 us 0.0790 us 4.816 us

TweakedAdd 2.547 us 0.0521 us 0.1318 us 2.505 us

SplitAdd 2.232 us 0.0434 us 0.0426 us 2.228 us

Windows:

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.309) Processor=Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), ProcessorCount=8 Frequency=2742189 Hz, Resolution=364.6722 ns, Timer=TSC .NET Core SDK=2.1.300-preview1-008174 [Host] : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT DefaultJob : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT

Method Mean Error StdDev

SimpleAdd 4.192 us 0.0240 us 0.0213 us

TweakedAdd 2.178 us 0.0203 us 0.0190 us

SplitAdd 1.950 us 0.0140 us 0.0124 us

Can you please re-check with your benchmark?

@gfoidl I'm still getting SplitAdd as slower on my machine:

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.309) Processor=Intel Core i5-2300 CPU 2.80GHz (Sandy Bridge), ProcessorCount=4 Frequency=2727538 Hz, Resolution=366.6310 ns, Timer=TSC .NET Core SDK=2.1.300-preview1-008174 [Host] : .NET Core ? (Framework 4.6.26216.04), 64bit RyuJIT DefaultJob : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT

Method Mean Error StdDev Median

SimpleAdd 8.267 us 0.1789 us 0.3888 us 8.103 us

TweakedAdd 3.358 us 0.0667 us 0.0935 us 3.381 us

SplitAdd 3.807 us 0.0816 us 0.2354 us 3.793 us

It could be caused by the different CPU, or maybe something else. Though even with your numbers, it's only 10 % improvement, so I'm not sure it's worth that much additional complexity.

Because of that, I'm not going to change my PR to include your changes. Instead, you could open your own PR.

svick · 2018-03-18T17:48:02Z

@dotnet-bot test OSX x64 Debug Build please

karelz

Nice!

karelz · 2018-03-18T19:07:57Z

Do we still need the original issue? Is it still tracking a meaningful improvement over your latest change?

cc @valenis @brianrob

svick · 2018-03-18T20:34:44Z

@karelz Like I said in my comment on the issue (https://github.com/dotnet/corefx/issues/28064#issuecomment-373950250), my opinion is that with this change, the proposed UnsafeFreeze API is not necessary, because the performance improvement would be relatively small.

But I don't know if that means the issue should be closed.

jaredpar · 2018-03-19T15:08:44Z

CC @VSadov

Commit migrated from dotnet/corefx@b5c1cee

Improve inling in ImmutableArray<T>.Builder

8f1310f

stephentoub approved these changes Mar 18, 2018

View reviewed changes

Therzok approved these changes Mar 18, 2018

View reviewed changes

gfoidl reviewed Mar 18, 2018

View reviewed changes

AArnott approved these changes Mar 18, 2018

View reviewed changes

gfoidl approved these changes Mar 18, 2018

View reviewed changes

karelz added the area-System.Collections label Mar 18, 2018

karelz assigned svick, ianhays and safern Mar 18, 2018

karelz approved these changes Mar 18, 2018

View reviewed changes

karelz added the tenet-performance Performance related issue label Mar 18, 2018

stephentoub merged commit b5c1cee into dotnet:master Mar 18, 2018

svick deleted the immutablearray-builder-inlining branch March 18, 2018 20:26

gfoidl mentioned this pull request Mar 18, 2018

ImmutableArray<T>.Builder.Add splitted in fast- and cold-path #28184

Closed

karelz added this to the 2.1.0 milestone Mar 27, 2018

ericstj pushed a commit to ericstj/corefx that referenced this pull request Mar 28, 2018

Improve inling in ImmutableArray<T>.Builder (dotnet#28177)

ba7d461

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

Improve inling in ImmutableArray<T>.Builder (dotnet/corefx#28177)

7b0722e

Commit migrated from dotnet/corefx@b5c1cee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve inling in ImmutableArray<T>.Builder#28177

Improve inling in ImmutableArray<T>.Builder#28177
stephentoub merged 1 commit into
dotnet:masterfrom
svick:immutablearray-builder-inlining

svick commented Mar 18, 2018

Uh oh!

stephentoub left a comment

Uh oh!

stephentoub commented Mar 18, 2018

Uh oh!

svick commented Mar 18, 2018

Uh oh!

gfoidl Mar 18, 2018

Uh oh!

svick Mar 18, 2018

Uh oh!

gfoidl Mar 18, 2018

Uh oh!

svick Mar 18, 2018

Uh oh!

svick commented Mar 18, 2018

Uh oh!

karelz left a comment

Uh oh!

karelz commented Mar 18, 2018

Uh oh!

svick commented Mar 18, 2018

Uh oh!

jaredpar commented Mar 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Method	Mean	Error	StdDev
SimpleAdd	7.935 us	0.0070 us	0.0055 us
TweakedAdd	3.274 us	0.0547 us	0.0512 us
SplitAdd	3.419 us	0.0322 us	0.0302 us

Method	Mean	Error	StdDev	Median
SimpleAdd	4.823 us	0.0891 us	0.0790 us	4.816 us
TweakedAdd	2.547 us	0.0521 us	0.1318 us	2.505 us
SplitAdd	2.232 us	0.0434 us	0.0426 us	2.228 us

Method	Mean	Error	StdDev
SimpleAdd	4.192 us	0.0240 us	0.0213 us
TweakedAdd	2.178 us	0.0203 us	0.0190 us
SplitAdd	1.950 us	0.0140 us	0.0124 us

Method	Mean	Error	StdDev	Median
SimpleAdd	8.267 us	0.1789 us	0.3888 us	8.103 us
TweakedAdd	3.358 us	0.0667 us	0.0935 us	3.381 us
SplitAdd	3.807 us	0.0816 us	0.2354 us	3.793 us

Conversation

svick commented Mar 18, 2018

Uh oh!

stephentoub left a comment

Choose a reason for hiding this comment

Uh oh!

stephentoub commented Mar 18, 2018

Uh oh!

svick commented Mar 18, 2018

Uh oh!

gfoidl Mar 18, 2018

Choose a reason for hiding this comment

Uh oh!

svick Mar 18, 2018

Choose a reason for hiding this comment

Uh oh!

gfoidl Mar 18, 2018

Choose a reason for hiding this comment

Uh oh!

svick Mar 18, 2018

Choose a reason for hiding this comment

Uh oh!

svick commented Mar 18, 2018

Uh oh!

karelz left a comment

Choose a reason for hiding this comment

Uh oh!

karelz commented Mar 18, 2018

Uh oh!

svick commented Mar 18, 2018

Uh oh!

jaredpar commented Mar 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants