ImmutableArray<T>.Builder.Add splitted in fast- and cold-path by gfoidl · Pull Request #28184 · dotnet/corefx

gfoidl · 2018-03-18T21:12:15Z

Description

Based on #28177 (comment)
Add is split in a fast-path without resizing the array, and a cold-path that does the resize.
On the fast-path the bounds-check for the array-access is also eliminated.

Benchmarks

Code for benchmarks is taken from svick

SimpleAdd is the original code, i.e. before #28177
TweakedAdd is code of #28177
SplitAdd is code of this PR.

win-x64

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.309)
Processor=Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), ProcessorCount=8
Frequency=2742189 Hz, Resolution=364.6722 ns, Timer=TSC
.NET Core SDK=2.1.300-preview1-008174
  [Host]     : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT

Method	Mean	Error	StdDev
SimpleAdd	4.192 us	0.0240 us	0.0213 us
TweakedAdd	2.178 us	0.0203 us	0.0190 us
SplitAdd	1.950 us	0.0140 us	0.0124 us

linux-x64

BenchmarkDotNet=v0.10.11, OS=ubuntu 16.04
Processor=Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), ProcessorCount=4
.NET Core SDK=2.1.300-preview1-008174
  [Host]     : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT

Method	Mean	Error	StdDev	Median
SimpleAdd	4.823 us	0.0891 us	0.0790 us	4.816 us
TweakedAdd	2.547 us	0.0521 us	0.1318 us	2.505 us
SplitAdd	2.232 us	0.0434 us	0.0426 us	2.228 us

linux-x64 (different CPU)

BenchmarkDotNet=v0.10.11, OS=ubuntu 17.10
Processor=Intel Xeon CPU 2.60GHz, ProcessorCount=2
.NET Core SDK=2.1.4
  [Host]     : .NET Core 2.0.5 (Framework 4.6.0.0), 64bit RyuJIT
  DefaultJob : .NET Core 2.0.5 (Framework 4.6.0.0), 64bit RyuJIT

Method	Mean	Error	StdDev	Median
SimpleAdd	6.320 us	0.1574 us	0.2673 us	6.182 us
TweakedAdd	3.947 us	0.0767 us	0.0788 us	3.933 us
SplitAdd	3.439 us	0.2023 us	0.2408 us	3.306 us

Notes

In #28177 (comment) @svick reports that this change decreases perf on his machine. That's why I tested on three different machines, and all show a perf improvement. List.Add, Stack.Push and similar classes use this pattern and all show an improvement.

gfoidl · 2018-03-19T09:01:41Z

Windows.10.Amd64.Open-Release-x86

System.Net.Http.HttpRequestException : An error occurred while sending the request.
---- System.Net.Http.WinHttpException : The server returned an invalid or unrecognized response

at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts) in D:\j\workspace\windows-TGrou---f8ac6754\src\System.Net.Http\src\System\Net\Http\HttpClient.cs:line 479
   at System.Net.Http.Functional.Tests.HttpClientHandler_Authentication_Test.<>c__DisplayClass13_0.<<PreAuthenticate_FirstRequestNoHeader_SecondRequestVariousStatusCodes_ThirdRequestPreauthenticates>b__0>d.MoveNext() in D:\j\workspace\windows-TGrou---f8ac6754\src\System.Net.Http\tests\FunctionalTests\HttpClientHandlerTest.Authentication.cs:line 333
--- End of stack trace from previous location where exception was thrown ---
   at System.Net.Test.Common.LoopbackServer.<>c__DisplayClass11_0.<<CreateClientAndServerAsync>b__0>d.MoveNext() in D:\j\workspace\windows-TGrou---f8ac6754\src\Common\tests\System\Net\Http\LoopbackServer.cs:line 84
--- End of stack trace from previous location where exception was thrown ---
   at System.Net.Test.Common.LoopbackServer.CreateServerAsync(Func`2 funcAsync, Options options) in D:\j\workspace\windows-TGrou---f8ac6754\src\Common\tests\System\Net\Http\LoopbackServer.cs:line 67
   at System.Net.Http.Functional.Tests.HttpClientHandler_Authentication_Test.PreAuthenticate_FirstRequestNoHeader_SecondRequestVariousStatusCodes_ThirdRequestPreauthenticates(HttpStatusCode statusCode) in D:\j\workspace\windows-TGrou---f8ac6754\src\System.Net.Http\tests\FunctionalTests\HttpClientHandlerTest.Authentication.cs:line 323
--- End of stack trace from previous location where exception was thrown ---
----- Inner Stack Trace -----
   at System.Threading.Tasks.RendezvousAwaitable`1.GetResult() in D:\j\workspace\windows-TGrou---f8ac6754\src\Common\src\System\Threading\Tasks\RendezvousAwaitable.cs:line 62
   at System.Net.Http.WinHttpHandler.StartRequest(WinHttpRequestState state) in D:\j\workspace\windows-TGrou---f8ac6754\src\System.Net.Http.WinHttpHandler\src\System\Net\Http\WinHttpHandler.cs:line 856

@dotnet-bot test Windows x86 Release Build

safern · 2018-03-19T17:33:27Z

cc: @stephentoub

stephentoub · 2018-03-21T02:34:30Z

            /// Adds an item to the <see cref="ICollection{T}"/>.
            /// </summary>
            /// <param name="item">The object to add to the <see cref="ICollection{T}"/>.</param>
+            [MethodImpl(MethodImplOptions.AggressiveInlining)]


How much of the improvement you're showing is due to AggressiveInlining vs due to the changes in the method body? I'm not convinced this should be AggressiveInlining.

Benchmark

Notes

Method Description

TweakedAdd current implementation

SplitAdd this PR

_NoInline-methods are attributed with [MethodImpl(MethodImplOptions.NoInlining)]
__Inline-methods are attributed with [MethodImpl(MethodImplOptions.AggressiveInlining)]
Methos without _Xxx are without any attributes.

Results

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.309) Processor=Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), ProcessorCount=8 Frequency=2742189 Hz, Resolution=364.6722 ns, Timer=TSC .NET Core SDK=2.1.300-preview3-008384 [Host] : .NET Core 2.1.0-preview2-26313-01 (Framework 4.6.26310.01), 64bit RyuJIT DefaultJob : .NET Core 2.1.0-preview2-26313-01 (Framework 4.6.26310.01), 64bit RyuJIT

Method Mean Error StdDev Scaled ScaledSD

TweakedAdd_NoInline 4.532 us 0.0900 us 0.1668 us 2.00 0.10

TweakedAdd 2.272 us 0.0479 us 0.0813 us 1.00 0.00

TweakedAdd_Inline 2.317 us 0.0464 us 0.0824 us 1.02 0.05

SplitAdd_NoInline 2.699 us 0.0505 us 0.0473 us 1.19 0.04

SplitAdd 3.034 us 0.0601 us 0.0715 us 1.34 0.05

SplitAdd_Inline 2.088 us 0.0416 us 0.0696 us 0.92 0.04

Discussion

SplitAdd

The JIT won't inline SplitAdd due to [FAILED: unprofitable inline] Builder:SplitAdd(long):this which seems strange to me, because the dasm for this method is:

; Assembly listing for method Builder:SplitAdd(long):this ; Emitting BLENDED_CODE for X64 CPU with AVX ; optimized code ; rsp based frame ; fully interruptible ; Final local variable assignments ; ; V00 this [V00,T00] ( 8, 6.50) ref -> rdi this class-hnd ; V01 arg1 [V01,T01] ( 5, 3.50) long -> rsi ; V02 loc0 [V02,T02] ( 6, 4 ) int -> rax ; V03 loc1 [V03,T03] ( 5, 4 ) ref -> rdx class-hnd ;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00] ; ; Lcl frame size = 0 G_M50053_IG01: G_M50053_IG02: 8B4710 mov eax, dword ptr [rdi+16] 488B5708 mov rdx, gword ptr [rdi+8] 394208 cmp dword ptr [rdx+8], eax 760E jbe SHORT G_M50053_IG04 4863C8 movsxd rcx, eax 488974CA10 mov qword ptr [rdx+8*rcx+16], rsi FFC0 inc eax 894710 mov dword ptr [rdi+16], eax G_M50053_IG03: C3 ret G_M50053_IG04: 48B8981431A3AC7F0000 mov rax, 0x7FACA3311498 G_M50053_IG05: 48FFE0 rex.jmp rax ; Total bytes of code 39, prolog size 0 for method Builder:SplitAdd(long):this ; ============================================================

Really not much code.

So SplitAdd isn't inlined, then why SplitAdd_NoInline from the benchmark shows different numbers? It's becuase of the different prolog, and the rex.jmp (although I have to admit that I don't know what rex.jmp is (yeah, I could search for it) and where it comes from):

; Assembly listing for method Builder:SplitAdd(long):this ; Emitting BLENDED_CODE for X64 CPU with AVX ; optimized code ; rsp based frame ; partially interruptible ; Final local variable assignments ; ; V00 this [V00,T00] ( 8, 6.50) ref -> rdi this class-hnd ; V01 arg1 [V01,T01] ( 5, 3.50) long -> rsi ; V02 loc0 [V02,T02] ( 6, 4 ) int -> rax ; V03 loc1 [V03,T03] ( 5, 4 ) ref -> rdx class-hnd ;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00] ; ; Lcl frame size = 8 G_M50053_IG01: 50 push rax G_M50053_IG02: 8B4710 mov eax, dword ptr [rdi+16] 488B5708 mov rdx, gword ptr [rdi+8] 394208 cmp dword ptr [rdx+8], eax 7612 jbe SHORT G_M50053_IG04 4863C8 movsxd rcx, eax 488974CA10 mov qword ptr [rdx+8*rcx+16], rsi FFC0 inc eax 894710 mov dword ptr [rdi+16], eax G_M50053_IG03: 4883C408 add rsp, 8 C3 ret G_M50053_IG04: E8B4F9FFFF call Builder:AddWithResize(long):this 90 nop G_M50053_IG05: 4883C408 add rsp, 8 C3 ret ; Total bytes of code 42, prolog size 1 for method Builder:SplitAdd(long):this ; ============================================================

Note: with AggressiveInling the JIT emits a call and no rex.jmp instruction.

Side note: in #28177 (comment) it's maybe that there was no AggressiveInlining.

TweakedAdd

JIT will inline this method by default, although the dasm is much greater than the one SplitAdd (here the dasm is shown from TweakAdd_NoInline):

; Assembly listing for method Builder:TweakedAdd(long):this ; Emitting BLENDED_CODE for X64 CPU with AVX ; optimized code ; rsp based frame ; partially interruptible ; Final local variable assignments ; ; V00 this [V00,T00] ( 10, 10 ) ref -> rbx this class-hnd ; V01 arg1 [V01,T03] ( 4, 4 ) long -> r14 ; V02 loc0 [V02,T04] ( 4, 4 ) int -> r15 ; V03 tmp0 [V03,T01] ( 6, 12 ) ref -> rax ; V04 tmp1 [V04,T02] ( 6, 12 ) int -> rdi ;# V05 OutArgs [V05 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00] ; ; Lcl frame size = 0 G_M41370_IG01: 4157 push r15 4156 push r14 53 push rbx 488BDF mov rbx, rdi 4C8BF6 mov r14, rsi G_M41370_IG02: 8B7B10 mov edi, dword ptr [rbx+16] 448D7F01 lea r15d, [rdi+1] 488BFB mov rdi, rbx 418BF7 mov esi, r15d E893FFFFFF call Builder:EnsureCapacity(int):this 488B4308 mov rax, gword ptr [rbx+8] 8B7B10 mov edi, dword ptr [rbx+16] 3B7808 cmp edi, dword ptr [rax+8] 7312 jae SHORT G_M41370_IG04 4863FF movsxd rdi, edi 4C8974F810 mov qword ptr [rax+8*rdi+16], r14 44897B10 mov dword ptr [rbx+16], r15d G_M41370_IG03: 5B pop rbx 415E pop r14 415F pop r15 C3 ret G_M41370_IG04: E8B0080F79 call CORINFO_HELP_RNGCHKFAIL CC int3 ; Total bytes of code 65, prolog size 5 for method Builder:TweakedAdd(long):this ; ============================================================

Conclusion

The "raw implementation" (NoInline compared) of SplitAdd is way faster than TweakedAdd. Because the JIT won't inline SplitAdd we could

improve JITs inlining heuristics

force the method to inline (AggressiveInling)

In List.Add the similar pattern with fast- and cold-path is used, and there is also AggressiveInling. So I don't see any reason why we shouldn't go with that.

You mentioned:

Method Description

TweakedAdd current implementation

SplitAdd this PR

But if I look at just those two rows of your results table in the same comment, it looks like "this PR" slows down the scenario. Am I reading it right? If so, I don't know why we would take this PR.

Theses results are for benchmarks to answert the question about the influence of aggressive inlining.
So the suffixes like _Inline have to taken into account when reading the results.

This PR, as implemented, shows the results in the SplitAdd_Inline row, and there is an improvement. Not a huge one, but still noticeable faster.

stephentoub · 2018-06-12T14:25:23Z

@safern, @ianhays, @AArnott, what do you want to happen with this PR? It's been sitting here since March.

AArnott

I'm hesitant to recommend accepting this PR. It's not clear that there's an improvement (that may just be a misunderstanding) but even if it is faster, is the perf increase important and significant enough to be a must have for something, to warrant the loss of servicing we'll take for the Add method?

AArnott · 2018-06-20T23:34:40Z

+                int count = _count;
+                T[] elements = _elements;
+
+                if ((uint)count < (uint)elements.Length)


What value do the (uint) casts add?

With the uint casts the JIT is able to eliminate the bound checks in elements[count]. Cf. dotnet/coreclr#9773

Can you add a code comment to that effect. That sounds significant, but it's not at all clear, such that I'd be afraid someone would just remove the casts and never know they eliminated the optimization.

AArnott · 2018-06-20T23:37:00Z

+                }
+            }
+
+            // Improve code quality as uncommon path


I don't understand what this code comment means. Is this a TODO item that we need to improve the code quality? What does "as uncommon path" have to do with it?

The AddWithResize-path is called from Add and is on the cold-path / uncommon-path. The comment is to "explain" why no-inlining is added. Similar to dotnet/coreclr#9539 (comment)

Do you have a better text for the comment, so to make it instantely clear what is meant?

How about:

// Specify NoInlining so that we are guaranteed an opportunity to service this method

AArnott · 2018-06-20T23:39:07Z

            /// Adds an item to the <see cref="ICollection{T}"/>.
            /// </summary>
            /// <param name="item">The object to add to the <see cref="ICollection{T}"/>.</param>
+            [MethodImpl(MethodImplOptions.AggressiveInlining)]


You mentioned:

Method Description

TweakedAdd current implementation

SplitAdd this PR

But if I look at just those two rows of your results table in the same comment, it looks like "this PR" slows down the scenario. Am I reading it right? If so, I don't know why we would take this PR.

gfoidl · 2018-06-21T08:27:30Z

not clear that there's an improvement (that may just be a misunderstanding)

It is. Please see #28184 (comment) and description for the PR.

is the perf increase important and significant enough

I can't judge this, but splitting into fast-/cold-path is quite a common technique employed in several "collection"-types.
The change is quite similar (actually based on) to the change in dotnet/coreclr#9539.

AArnott

Thanks. I'm satisfied with the changes and the research you've done.
The perf improvement still seems small to me. It would be interesting to see the perf improvement that came from dotnet/coreclr#9539 (the change in List<T>) but I don't see any measurement comparisons (just asm comparison) on that PR. (so thanks for including it in yours!)
I suspect @stephentoub is best equipped to make the call on the perf benefit being worthwhile. But the code change looks good.

AArnott · 2018-06-21T14:18:11Z

+                }
+            }
+
+            // Improve code quality as uncommon path


How about:

// Specify NoInlining so that we are guaranteed an opportunity to service this method

* #28184 (comment) * #28184 (comment)

stephentoub · 2018-06-22T02:23:47Z

+                }
+            }
+
+            // Specify NoInlining so that we are guaranteed an opportunity to service this method


@AArnott, what did you mean by "guaranteed an opportunity to service this method"? I though the NoInlining was here to avoid including the slow path as part of the caller getting aggressively inlined and bloating its caller unnecessarily.

I can see it has the effect you say. I thought that aggressive inlining limited our options to change the method later due to ngen. My proposed comment was focused on that.

stephentoub · 2018-06-23T20:45:09Z

@AArnott's comments raise a good question:

The perf testing done here is on .NET Core, with the latest JIT. But this library is used on many previous versions of .NET Framework, right? @gfoidl, I assume you've not validated that this is actually an improvement across all of those? You asked why this library might be special when we've made similar changes to core collection types elsewhere in coreclr/corefx... to me, that's why.

gfoidl · 2018-06-24T10:14:05Z

@gfoidl, I assume you've not validated that this is actually an improvement across all of those?

I validated just for .NET Core, under the assumption this code is used only by .NET Core and may be eventually be backported to .NET Full. As I see now, this assumption is incorrect.

For .NET Full I get these numbers:

BenchmarkDotNet=v0.10.11, OS=Windows 10.0.17134
Processor=Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), ProcessorCount=8
Frequency=2742190 Hz, Resolution=364.6720 ns, Timer=TSC
  [Host]     : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3110.0
  DefaultJob : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3110.0

Method	Mean	Error	StdDev
SimpleAdd	4.394 us	0.0398 us	0.0372 us
TweakedAdd	2.035 us	0.0353 us	0.0363 us
SplitAdd	2.003 us	0.0391 us	0.0401 us

BenchmarkDotNet=v0.10.11, OS=Windows 10.0.17134
Processor=Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), ProcessorCount=8
Frequency=2742190 Hz, Resolution=364.6720 ns, Timer=TSC
.NET Core SDK=2.1.300
  [Host] : .NET Core 2.1.0 (Framework 4.6.26515.07), 64bit RyuJIT
  Clr    : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3110.0
  Core   : .NET Core 2.1.0 (Framework 4.6.26515.07), 64bit RyuJIT

Method	Job	Runtime	Mean	Error	StdDev	Scaled	ScaledSD
TweakedAdd	Clr	Clr	1.991 us	0.0080 us	0.0058 us	1.00	0.00
SplitAdd_Inline	Clr	Clr	2.012 us	0.0248 us	0.0232 us	1.01	0.01
TweakedAdd	Core	Core	2.360 us	0.0468 us	0.0867 us	1.00	0.00
SplitAdd_Inline	Core	Core	2.128 us	0.0390 us	0.0365 us	0.90	0.04

So on .NET Full it's not really an improvement, rather it stays on par (within noise).

Dasm for .NET Full and .NET Core is equal.

dasm

Only the loop is shown. In total the dasm is similar.

; netcoreapp2.1
; ...
00007FFCB2011939  mov         ecx,dword ptr [rsi+10h]  
00007FFCB201193C  mov         rdx,qword ptr [rsi+8]  
00007FFCB2011940  mov         eax,dword ptr [rdx+8]  
00007FFCB2011943  cmp         eax,ecx  
00007FFCB2011945  jbe         00007FFCB2011956  
00007FFCB2011947  movsxd      rax,ecx  
00007FFCB201194A  mov         qword ptr [rdx+rax*8+10h],rdi  
00007FFCB201194F  inc         ecx  
00007FFCB2011951  mov         dword ptr [rsi+10h],ecx  
00007FFCB2011954  jmp         00007FFCB2011961  
00007FFCB2011956  mov         rcx,rsi  
00007FFCB2011959  mov         rdx,rdi  
00007FFCB201195C  call        00007FFCB2011310  
00007FFCB2011961  inc         rdi  
00007FFCB2011964  mov         rcx,qword ptr [rsi+8]  
00007FFCB2011968  mov         ecx,dword ptr [rcx+8]  
00007FFCB201196B  movsxd      rcx,ecx  
00007FFCB201196E  cmp         rcx,rdi  
00007FFCB2011971  jg          00007FFCB2011939
; ...
;------------------------------------------------------------------------------
; net471
; ...
00007FFCDE5C0919  mov         ecx,dword ptr [rsi+10h]  
00007FFCDE5C091C  mov         rdx,qword ptr [rsi+8]  
00007FFCDE5C0920  mov         eax,dword ptr [rdx+8]  
00007FFCDE5C0923  cmp         eax,ecx  
00007FFCDE5C0925  jbe         00007FFCDE5C0936  
00007FFCDE5C0927  movsxd      rax,ecx  
00007FFCDE5C092A  mov         qword ptr [rdx+rax*8+10h],rdi  
00007FFCDE5C092F  inc         ecx  
00007FFCDE5C0931  mov         dword ptr [rsi+10h],ecx  
00007FFCDE5C0934  jmp         00007FFCDE5C0941  
00007FFCDE5C0936  mov         rcx,rsi  
00007FFCDE5C0939  mov         rdx,rdi  
00007FFCDE5C093C  call        00007FFCDE5C02F0  
00007FFCDE5C0941  inc         rdi  
00007FFCDE5C0944  mov         rcx,qword ptr [rsi+8]  
00007FFCDE5C0948  mov         ecx,dword ptr [rcx+8]  
00007FFCDE5C094B  movsxd      rcx,ecx  
00007FFCDE5C094E  cmp         rcx,rdi  
00007FFCDE5C0951  jg          00007FFCDE5C0919  
; ...

Didn't test other targets.

this library is used on many previous versions of .NET Framework, right?

Is this indicated by

corefx/src/System.Collections.Immutable/pkg/System.Collections.Immutable.pkgproj

Line 10 in 5e3aecf

    
           <SupportedFramework>net45;netcore45;netcoreapp1.0;wp8;wpa81;$(AllXamarinFrameworks)</SupportedFramework>

-- or how can I see on which targets a library will be used (in order to take this into account next time)?

So what to do?

if def the .NET Core path? Make code harder to maintain, and for the little perf-win I wouldn't do this.
Leave the code as is, close the PR?
...?

stephentoub · 2018-06-27T01:54:38Z

So what to do?

Thanks, @gfoidl. The potential improvement here is small and it complicates the code to do this kind of refactoring plus then to special-case it would require a lot more than just ifdef'ing (we'd need to ship multiple binaries, deal with the packaging, etc.). I appreciate your efforts here, but I think we should just leave it as-is.

gfoidl added 2 commits March 18, 2018 21:53

Add splitted in fast- and cold-path

b28bcf2

Added missing namespace

9e6b546

karelz added the area-System.Collections label Mar 19, 2018

karelz assigned gfoidl, ianhays and safern Mar 19, 2018

karelz added the tenet-performance Performance related issue label Mar 19, 2018

stephentoub reviewed Mar 21, 2018

View reviewed changes

karelz modified the milestone: 2.1.0 Mar 27, 2018

karelz added this to the 2.2.0 milestone Apr 4, 2018

AArnott reviewed Jun 20, 2018

View reviewed changes

AArnott reviewed Jun 21, 2018

View reviewed changes

Addressed PR-feedback

74de90d

* #28184 (comment) * #28184 (comment)

stephentoub reviewed Jun 22, 2018

View reviewed changes

stephentoub closed this Jun 27, 2018

gfoidl deleted the immutablearray-builder branch June 27, 2018 13:09

Method	Mean	Error	StdDev	Scaled	ScaledSD
TweakedAdd_NoInline	4.532 us	0.0900 us	0.1668 us	2.00	0.10
TweakedAdd	2.272 us	0.0479 us	0.0813 us	1.00	0.00
TweakedAdd_Inline	2.317 us	0.0464 us	0.0824 us	1.02	0.05
SplitAdd_NoInline	2.699 us	0.0505 us	0.0473 us	1.19	0.04
SplitAdd	3.034 us	0.0601 us	0.0715 us	1.34	0.05
SplitAdd_Inline	2.088 us	0.0416 us	0.0696 us	0.92	0.04

Conversation

gfoidl commented Mar 18, 2018

Description

Benchmarks

win-x64

linux-x64

linux-x64 (different CPU)

Notes

Uh oh!

gfoidl commented Mar 19, 2018

Uh oh!

safern commented Mar 19, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Benchmark

Notes

Results

Discussion

SplitAdd

TweakedAdd

Conclusion

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephentoub commented Jun 12, 2018

Uh oh!

AArnott left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gfoidl Jun 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gfoidl commented Jun 21, 2018

Uh oh!

AArnott left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephentoub commented Jun 23, 2018

Uh oh!

gfoidl commented Jun 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephentoub commented Jun 27, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

gfoidl Jun 21, 2018 •

edited

Loading

gfoidl commented Jun 24, 2018 •

edited

Loading