Improve inling in ImmutableArray<T>.Builder#28177
Conversation
|
@dotnet-bot test Linux x64 Release Build please |
| { | ||
| this.EnsureCapacity(this.Count + 1); | ||
| _elements[_count++] = item; | ||
| int newCount = _count + 1; |
There was a problem hiding this comment.
You can optimize Add even further when you split it in a fast- and cold-path. Like
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void Add(T item)
{
int count = _count;
T[] elements = _elements;
if ((uint)count < (uint)elements.Length)
{
elements[count] = item;
_count = count + 1;
}
else
{
AddWithResize(item);
}
}
// Improve code quality as uncommon path
[MethodImpl(MethodImplOptions.NoInlining)]
private void AddWithResize(T item)
{
int newCount = _count + 1;
this.EnsureCapacity(newCount);
_elements[_count] = item;
_count = newCount;
}Note that Add has to be attributed with AggressiveInlining, because -- interestingly though less asm-size -- the JIT won't inline otherwise.
This also safes the bounds-check in the fast-path.
There was a problem hiding this comment.
@gfoidl That's a significant increase in complexity. Do you have something that shows that it's actually sufficiently faster to warrant that?
I made a quick benchmark and it seems your code is actually slower (SimpleAdd is the original code, TweakedAdd is code with my change, SplitAdd is your change):
| Method | Mean | Error | StdDev |
|---|---|---|---|
| SimpleAdd | 7.935 us | 0.0070 us | 0.0055 us |
| TweakedAdd | 3.274 us | 0.0547 us | 0.0512 us |
| SplitAdd | 3.419 us | 0.0322 us | 0.0302 us |
There was a problem hiding this comment.
Do you have something that shows that it's actually sufficiently faster to warrant that?
List.Add, Stack.Push and similar classes use this pattern and there it is faster. My suggestion is based on that experience.
When I run your benchmark the results look a bit different.
Linux:
BenchmarkDotNet=v0.10.11, OS=ubuntu 16.04
Processor=Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), ProcessorCount=4
.NET Core SDK=2.1.300-preview1-008174
[Host] : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT
DefaultJob : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT
| Method | Mean | Error | StdDev | Median |
|---|---|---|---|---|
| SimpleAdd | 4.823 us | 0.0891 us | 0.0790 us | 4.816 us |
| TweakedAdd | 2.547 us | 0.0521 us | 0.1318 us | 2.505 us |
| SplitAdd | 2.232 us | 0.0434 us | 0.0426 us | 2.228 us |
Windows:
BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.309)
Processor=Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), ProcessorCount=8
Frequency=2742189 Hz, Resolution=364.6722 ns, Timer=TSC
.NET Core SDK=2.1.300-preview1-008174
[Host] : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT
DefaultJob : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT
| Method | Mean | Error | StdDev |
|---|---|---|---|
| SimpleAdd | 4.192 us | 0.0240 us | 0.0213 us |
| TweakedAdd | 2.178 us | 0.0203 us | 0.0190 us |
| SplitAdd | 1.950 us | 0.0140 us | 0.0124 us |
Can you please re-check with your benchmark?
There was a problem hiding this comment.
@gfoidl I'm still getting SplitAdd as slower on my machine:
BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.309)
Processor=Intel Core i5-2300 CPU 2.80GHz (Sandy Bridge), ProcessorCount=4
Frequency=2727538 Hz, Resolution=366.6310 ns, Timer=TSC
.NET Core SDK=2.1.300-preview1-008174
[Host] : .NET Core ? (Framework 4.6.26216.04), 64bit RyuJIT
DefaultJob : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT
| Method | Mean | Error | StdDev | Median |
|---|---|---|---|---|
| SimpleAdd | 8.267 us | 0.1789 us | 0.3888 us | 8.103 us |
| TweakedAdd | 3.358 us | 0.0667 us | 0.0935 us | 3.381 us |
| SplitAdd | 3.807 us | 0.0816 us | 0.2354 us | 3.793 us |
It could be caused by the different CPU, or maybe something else. Though even with your numbers, it's only 10 % improvement, so I'm not sure it's worth that much additional complexity.
Because of that, I'm not going to change my PR to include your changes. Instead, you could open your own PR.
|
@dotnet-bot test OSX x64 Debug Build please |
|
@karelz Like I said in my comment on the issue (https://github.com/dotnet/corefx/issues/28064#issuecomment-373950250), my opinion is that with this change, the proposed But I don't know if that means the issue should be closed. |
|
CC @VSadov |
The issue https://github.com/dotnet/corefx/issues/28064 is about a benchmark whose performance is so bad that a new dangerous method was considered to improve that situation. But almost the same effect can be achieved just by ensuring that the
Add()method and the indexer setter onImmutableArray<T>.Buildercan be inlined (see https://github.com/dotnet/corefx/issues/28064#issuecomment-373950250 for more details). This PR does that.I have only verified that extracting the
throwis useful for the indexer setter. But the indexer getter andItemRefare very similar, so I assumed it makes sense for them too.Performance results using BenchmarkDotNet (source):
Before:
After:
Relevant portions of JIT dumps:
Before:
After: