Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

ImmutableArray<T>.Builder.Add splitted in fast- and cold-path#28184

Closed
gfoidl wants to merge 3 commits into
dotnet:masterfrom
gfoidl:immutablearray-builder
Closed

ImmutableArray<T>.Builder.Add splitted in fast- and cold-path#28184
gfoidl wants to merge 3 commits into
dotnet:masterfrom
gfoidl:immutablearray-builder

Conversation

@gfoidl
Copy link
Copy Markdown
Member

@gfoidl gfoidl commented Mar 18, 2018

Description

Based on #28177 (comment)
Add is split in a fast-path without resizing the array, and a cold-path that does the resize.
On the fast-path the bounds-check for the array-access is also eliminated.

Benchmarks

Code for benchmarks is taken from svick

SimpleAdd is the original code, i.e. before #28177
TweakedAdd is code of #28177
SplitAdd is code of this PR.

win-x64

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.309)
Processor=Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), ProcessorCount=8
Frequency=2742189 Hz, Resolution=364.6722 ns, Timer=TSC
.NET Core SDK=2.1.300-preview1-008174
  [Host]     : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT

Method Mean Error StdDev
SimpleAdd 4.192 us 0.0240 us 0.0213 us
TweakedAdd 2.178 us 0.0203 us 0.0190 us
SplitAdd 1.950 us 0.0140 us 0.0124 us

linux-x64

BenchmarkDotNet=v0.10.11, OS=ubuntu 16.04
Processor=Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), ProcessorCount=4
.NET Core SDK=2.1.300-preview1-008174
  [Host]     : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0-preview1-26216-03 (Framework 4.6.26216.04), 64bit RyuJIT

Method Mean Error StdDev Median
SimpleAdd 4.823 us 0.0891 us 0.0790 us 4.816 us
TweakedAdd 2.547 us 0.0521 us 0.1318 us 2.505 us
SplitAdd 2.232 us 0.0434 us 0.0426 us 2.228 us

linux-x64 (different CPU)

BenchmarkDotNet=v0.10.11, OS=ubuntu 17.10
Processor=Intel Xeon CPU 2.60GHz, ProcessorCount=2
.NET Core SDK=2.1.4
  [Host]     : .NET Core 2.0.5 (Framework 4.6.0.0), 64bit RyuJIT
  DefaultJob : .NET Core 2.0.5 (Framework 4.6.0.0), 64bit RyuJIT

Method Mean Error StdDev Median
SimpleAdd 6.320 us 0.1574 us 0.2673 us 6.182 us
TweakedAdd 3.947 us 0.0767 us 0.0788 us 3.933 us
SplitAdd 3.439 us 0.2023 us 0.2408 us 3.306 us

Notes

In #28177 (comment) @svick reports that this change decreases perf on his machine. That's why I tested on three different machines, and all show a perf improvement. List.Add, Stack.Push and similar classes use this pattern and all show an improvement.

@gfoidl
Copy link
Copy Markdown
Member Author

gfoidl commented Mar 19, 2018

Windows.10.Amd64.Open-Release-x86

System.Net.Http.HttpRequestException : An error occurred while sending the request.
---- System.Net.Http.WinHttpException : The server returned an invalid or unrecognized response

at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts) in D:\j\workspace\windows-TGrou---f8ac6754\src\System.Net.Http\src\System\Net\Http\HttpClient.cs:line 479
   at System.Net.Http.Functional.Tests.HttpClientHandler_Authentication_Test.<>c__DisplayClass13_0.<<PreAuthenticate_FirstRequestNoHeader_SecondRequestVariousStatusCodes_ThirdRequestPreauthenticates>b__0>d.MoveNext() in D:\j\workspace\windows-TGrou---f8ac6754\src\System.Net.Http\tests\FunctionalTests\HttpClientHandlerTest.Authentication.cs:line 333
--- End of stack trace from previous location where exception was thrown ---
   at System.Net.Test.Common.LoopbackServer.<>c__DisplayClass11_0.<<CreateClientAndServerAsync>b__0>d.MoveNext() in D:\j\workspace\windows-TGrou---f8ac6754\src\Common\tests\System\Net\Http\LoopbackServer.cs:line 84
--- End of stack trace from previous location where exception was thrown ---
   at System.Net.Test.Common.LoopbackServer.CreateServerAsync(Func`2 funcAsync, Options options) in D:\j\workspace\windows-TGrou---f8ac6754\src\Common\tests\System\Net\Http\LoopbackServer.cs:line 67
   at System.Net.Http.Functional.Tests.HttpClientHandler_Authentication_Test.PreAuthenticate_FirstRequestNoHeader_SecondRequestVariousStatusCodes_ThirdRequestPreauthenticates(HttpStatusCode statusCode) in D:\j\workspace\windows-TGrou---f8ac6754\src\System.Net.Http\tests\FunctionalTests\HttpClientHandlerTest.Authentication.cs:line 323
--- End of stack trace from previous location where exception was thrown ---
----- Inner Stack Trace -----
   at System.Threading.Tasks.RendezvousAwaitable`1.GetResult() in D:\j\workspace\windows-TGrou---f8ac6754\src\Common\src\System\Threading\Tasks\RendezvousAwaitable.cs:line 62
   at System.Net.Http.WinHttpHandler.StartRequest(WinHttpRequestState state) in D:\j\workspace\windows-TGrou---f8ac6754\src\System.Net.Http.WinHttpHandler\src\System\Net\Http\WinHttpHandler.cs:line 856

@dotnet-bot test Windows x86 Release Build

@karelz karelz added the tenet-performance Performance related issue label Mar 19, 2018
@safern
Copy link
Copy Markdown
Member

safern commented Mar 19, 2018

cc: @stephentoub

/// Adds an item to the <see cref="ICollection{T}"/>.
/// </summary>
/// <param name="item">The object to add to the <see cref="ICollection{T}"/>.</param>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of the improvement you're showing is due to AggressiveInlining vs due to the changes in the method body? I'm not convinced this should be AggressiveInlining.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark

Notes

Method Description
TweakedAdd current implementation
SplitAdd this PR

_NoInline-methods are attributed with [MethodImpl(MethodImplOptions.NoInlining)]
__Inline-methods are attributed with [MethodImpl(MethodImplOptions.AggressiveInlining)]
Methos without _Xxx are without any attributes.

Results

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.309)
Processor=Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), ProcessorCount=8
Frequency=2742189 Hz, Resolution=364.6722 ns, Timer=TSC
.NET Core SDK=2.1.300-preview3-008384
  [Host]     : .NET Core 2.1.0-preview2-26313-01 (Framework 4.6.26310.01), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0-preview2-26313-01 (Framework 4.6.26310.01), 64bit RyuJIT

Method Mean Error StdDev Scaled ScaledSD
TweakedAdd_NoInline 4.532 us 0.0900 us 0.1668 us 2.00 0.10
TweakedAdd 2.272 us 0.0479 us 0.0813 us 1.00 0.00
TweakedAdd_Inline 2.317 us 0.0464 us 0.0824 us 1.02 0.05
SplitAdd_NoInline 2.699 us 0.0505 us 0.0473 us 1.19 0.04
SplitAdd 3.034 us 0.0601 us 0.0715 us 1.34 0.05
SplitAdd_Inline 2.088 us 0.0416 us 0.0696 us 0.92 0.04

Discussion

SplitAdd

The JIT won't inline SplitAdd due to [FAILED: unprofitable inline] Builder:SplitAdd(long):this which seems strange to me, because the dasm for this method is:

; Assembly listing for method Builder:SplitAdd(long):this
; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 this         [V00,T00] (  8,  6.50)     ref  ->  rdi         this class-hnd
;  V01 arg1         [V01,T01] (  5,  3.50)    long  ->  rsi        
;  V02 loc0         [V02,T02] (  6,  4   )     int  ->  rax        
;  V03 loc1         [V03,T03] (  5,  4   )     ref  ->  rdx         class-hnd
;# V04 OutArgs      [V04    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]  
;
; Lcl frame size = 0

G_M50053_IG01:

G_M50053_IG02:
       8B4710               mov      eax, dword ptr [rdi+16]
       488B5708             mov      rdx, gword ptr [rdi+8]
       394208               cmp      dword ptr [rdx+8], eax
       760E                 jbe      SHORT G_M50053_IG04
       4863C8               movsxd   rcx, eax
       488974CA10           mov      qword ptr [rdx+8*rcx+16], rsi
       FFC0                 inc      eax
       894710               mov      dword ptr [rdi+16], eax

G_M50053_IG03:
       C3                   ret      

G_M50053_IG04:
       48B8981431A3AC7F0000 mov      rax, 0x7FACA3311498

G_M50053_IG05:
       48FFE0               rex.jmp  rax

; Total bytes of code 39, prolog size 0 for method Builder:SplitAdd(long):this
; ============================================================

Really not much code.

So SplitAdd isn't inlined, then why SplitAdd_NoInline from the benchmark shows different numbers? It's becuase of the different prolog, and the rex.jmp (although I have to admit that I don't know what rex.jmp is (yeah, I could search for it) and where it comes from):

; Assembly listing for method Builder:SplitAdd(long):this
; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 this         [V00,T00] (  8,  6.50)     ref  ->  rdi         this class-hnd
;  V01 arg1         [V01,T01] (  5,  3.50)    long  ->  rsi        
;  V02 loc0         [V02,T02] (  6,  4   )     int  ->  rax        
;  V03 loc1         [V03,T03] (  5,  4   )     ref  ->  rdx         class-hnd
;# V04 OutArgs      [V04    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]  
;
; Lcl frame size = 8

G_M50053_IG01:
       50                   push     rax

G_M50053_IG02:
       8B4710               mov      eax, dword ptr [rdi+16]
       488B5708             mov      rdx, gword ptr [rdi+8]
       394208               cmp      dword ptr [rdx+8], eax
       7612                 jbe      SHORT G_M50053_IG04
       4863C8               movsxd   rcx, eax
       488974CA10           mov      qword ptr [rdx+8*rcx+16], rsi
       FFC0                 inc      eax
       894710               mov      dword ptr [rdi+16], eax

G_M50053_IG03:
       4883C408             add      rsp, 8
       C3                   ret      

G_M50053_IG04:
       E8B4F9FFFF           call     Builder:AddWithResize(long):this
       90                   nop      

G_M50053_IG05:
       4883C408             add      rsp, 8
       C3                   ret      

; Total bytes of code 42, prolog size 1 for method Builder:SplitAdd(long):this
; ============================================================

Note: with AggressiveInling the JIT emits a call and no rex.jmp instruction.

Side note: in #28177 (comment) it's maybe that there was no AggressiveInlining.

TweakedAdd

JIT will inline this method by default, although the dasm is much greater than the one SplitAdd (here the dasm is shown from TweakAdd_NoInline):

; Assembly listing for method Builder:TweakedAdd(long):this
; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 this         [V00,T00] ( 10, 10   )     ref  ->  rbx         this class-hnd
;  V01 arg1         [V01,T03] (  4,  4   )    long  ->  r14        
;  V02 loc0         [V02,T04] (  4,  4   )     int  ->  r15        
;  V03 tmp0         [V03,T01] (  6, 12   )     ref  ->  rax        
;  V04 tmp1         [V04,T02] (  6, 12   )     int  ->  rdi        
;# V05 OutArgs      [V05    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]  
;
; Lcl frame size = 0

G_M41370_IG01:
       4157                 push     r15
       4156                 push     r14
       53                   push     rbx
       488BDF               mov      rbx, rdi
       4C8BF6               mov      r14, rsi

G_M41370_IG02:
       8B7B10               mov      edi, dword ptr [rbx+16]
       448D7F01             lea      r15d, [rdi+1]
       488BFB               mov      rdi, rbx
       418BF7               mov      esi, r15d
       E893FFFFFF           call     Builder:EnsureCapacity(int):this
       488B4308             mov      rax, gword ptr [rbx+8]
       8B7B10               mov      edi, dword ptr [rbx+16]
       3B7808               cmp      edi, dword ptr [rax+8]
       7312                 jae      SHORT G_M41370_IG04
       4863FF               movsxd   rdi, edi
       4C8974F810           mov      qword ptr [rax+8*rdi+16], r14
       44897B10             mov      dword ptr [rbx+16], r15d

G_M41370_IG03:
       5B                   pop      rbx
       415E                 pop      r14
       415F                 pop      r15
       C3                   ret      

G_M41370_IG04:
       E8B0080F79           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3     

; Total bytes of code 65, prolog size 5 for method Builder:TweakedAdd(long):this
; ============================================================

Conclusion

The "raw implementation" (NoInline compared) of SplitAdd is way faster than TweakedAdd. Because the JIT won't inline SplitAdd we could

  • improve JITs inlining heuristics
  • force the method to inline (AggressiveInling)

In List.Add the similar pattern with fast- and cold-path is used, and there is also AggressiveInling. So I don't see any reason why we shouldn't go with that.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned:

Method Description
TweakedAdd current implementation
SplitAdd this PR

But if I look at just those two rows of your results table in the same comment, it looks like "this PR" slows down the scenario. Am I reading it right? If so, I don't know why we would take this PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theses results are for benchmarks to answert the question about the influence of aggressive inlining.
So the suffixes like _Inline have to taken into account when reading the results.

This PR, as implemented, shows the results in the SplitAdd_Inline row, and there is an improvement. Not a huge one, but still noticeable faster.

@karelz karelz modified the milestone: 2.1.0 Mar 27, 2018
@karelz karelz added this to the 2.2.0 milestone Apr 4, 2018
@stephentoub
Copy link
Copy Markdown
Member

@safern, @ianhays, @AArnott, what do you want to happen with this PR? It's been sitting here since March.

Copy link
Copy Markdown

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to recommend accepting this PR. It's not clear that there's an improvement (that may just be a misunderstanding) but even if it is faster, is the perf increase important and significant enough to be a must have for something, to warrant the loss of servicing we'll take for the Add method?

int count = _count;
T[] elements = _elements;

if ((uint)count < (uint)elements.Length)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What value do the (uint) casts add?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the uint casts the JIT is able to eliminate the bound checks in elements[count]. Cf. dotnet/coreclr#9773

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a code comment to that effect. That sounds significant, but it's not at all clear, such that I'd be afraid someone would just remove the casts and never know they eliminated the optimization.

}
}

// Improve code quality as uncommon path
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this code comment means. Is this a TODO item that we need to improve the code quality? What does "as uncommon path" have to do with it?

Copy link
Copy Markdown
Member Author

@gfoidl gfoidl Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AddWithResize-path is called from Add and is on the cold-path / uncommon-path. The comment is to "explain" why no-inlining is added. Similar to dotnet/coreclr#9539 (comment)

Do you have a better text for the comment, so to make it instantely clear what is meant?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

// Specify NoInlining so that we are guaranteed an opportunity to service this method

/// Adds an item to the <see cref="ICollection{T}"/>.
/// </summary>
/// <param name="item">The object to add to the <see cref="ICollection{T}"/>.</param>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned:

Method Description
TweakedAdd current implementation
SplitAdd this PR

But if I look at just those two rows of your results table in the same comment, it looks like "this PR" slows down the scenario. Am I reading it right? If so, I don't know why we would take this PR.

@gfoidl
Copy link
Copy Markdown
Member Author

gfoidl commented Jun 21, 2018

not clear that there's an improvement (that may just be a misunderstanding)

It is. Please see #28184 (comment) and description for the PR.

is the perf increase important and significant enough

I can't judge this, but splitting into fast-/cold-path is quite a common technique employed in several "collection"-types.
The change is quite similar (actually based on) to the change in dotnet/coreclr#9539.

Copy link
Copy Markdown

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'm satisfied with the changes and the research you've done.
The perf improvement still seems small to me. It would be interesting to see the perf improvement that came from dotnet/coreclr#9539 (the change in List<T>) but I don't see any measurement comparisons (just asm comparison) on that PR. (so thanks for including it in yours!)
I suspect @stephentoub is best equipped to make the call on the perf benefit being worthwhile. But the code change looks good.

}
}

// Improve code quality as uncommon path
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

// Specify NoInlining so that we are guaranteed an opportunity to service this method

}
}

// Specify NoInlining so that we are guaranteed an opportunity to service this method
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AArnott, what did you mean by "guaranteed an opportunity to service this method"? I though the NoInlining was here to avoid including the slow path as part of the caller getting aggressively inlined and bloating its caller unnecessarily.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see it has the effect you say. I thought that aggressive inlining limited our options to change the method later due to ngen. My proposed comment was focused on that.

@stephentoub
Copy link
Copy Markdown
Member

@AArnott's comments raise a good question:

The perf testing done here is on .NET Core, with the latest JIT. But this library is used on many previous versions of .NET Framework, right? @gfoidl, I assume you've not validated that this is actually an improvement across all of those? You asked why this library might be special when we've made similar changes to core collection types elsewhere in coreclr/corefx... to me, that's why.

@gfoidl
Copy link
Copy Markdown
Member Author

gfoidl commented Jun 24, 2018

@gfoidl, I assume you've not validated that this is actually an improvement across all of those?

I validated just for .NET Core, under the assumption this code is used only by .NET Core and may be eventually be backported to .NET Full. As I see now, this assumption is incorrect.

For .NET Full I get these numbers:

BenchmarkDotNet=v0.10.11, OS=Windows 10.0.17134
Processor=Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), ProcessorCount=8
Frequency=2742190 Hz, Resolution=364.6720 ns, Timer=TSC
  [Host]     : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3110.0
  DefaultJob : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3110.0

Method Mean Error StdDev
SimpleAdd 4.394 us 0.0398 us 0.0372 us
TweakedAdd 2.035 us 0.0353 us 0.0363 us
SplitAdd 2.003 us 0.0391 us 0.0401 us
BenchmarkDotNet=v0.10.11, OS=Windows 10.0.17134
Processor=Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), ProcessorCount=8
Frequency=2742190 Hz, Resolution=364.6720 ns, Timer=TSC
.NET Core SDK=2.1.300
  [Host] : .NET Core 2.1.0 (Framework 4.6.26515.07), 64bit RyuJIT
  Clr    : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3110.0
  Core   : .NET Core 2.1.0 (Framework 4.6.26515.07), 64bit RyuJIT

Method Job Runtime Mean Error StdDev Scaled ScaledSD
TweakedAdd Clr Clr 1.991 us 0.0080 us 0.0058 us 1.00 0.00
SplitAdd_Inline Clr Clr 2.012 us 0.0248 us 0.0232 us 1.01 0.01
TweakedAdd Core Core 2.360 us 0.0468 us 0.0867 us 1.00 0.00
SplitAdd_Inline Core Core 2.128 us 0.0390 us 0.0365 us 0.90 0.04

So on .NET Full it's not really an improvement, rather it stays on par (within noise).

Dasm for .NET Full and .NET Core is equal.

dasm

Only the loop is shown. In total the dasm is similar.

; netcoreapp2.1
; ...
00007FFCB2011939  mov         ecx,dword ptr [rsi+10h]  
00007FFCB201193C  mov         rdx,qword ptr [rsi+8]  
00007FFCB2011940  mov         eax,dword ptr [rdx+8]  
00007FFCB2011943  cmp         eax,ecx  
00007FFCB2011945  jbe         00007FFCB2011956  
00007FFCB2011947  movsxd      rax,ecx  
00007FFCB201194A  mov         qword ptr [rdx+rax*8+10h],rdi  
00007FFCB201194F  inc         ecx  
00007FFCB2011951  mov         dword ptr [rsi+10h],ecx  
00007FFCB2011954  jmp         00007FFCB2011961  
00007FFCB2011956  mov         rcx,rsi  
00007FFCB2011959  mov         rdx,rdi  
00007FFCB201195C  call        00007FFCB2011310  
00007FFCB2011961  inc         rdi  
00007FFCB2011964  mov         rcx,qword ptr [rsi+8]  
00007FFCB2011968  mov         ecx,dword ptr [rcx+8]  
00007FFCB201196B  movsxd      rcx,ecx  
00007FFCB201196E  cmp         rcx,rdi  
00007FFCB2011971  jg          00007FFCB2011939
; ...
;------------------------------------------------------------------------------
; net471
; ...
00007FFCDE5C0919  mov         ecx,dword ptr [rsi+10h]  
00007FFCDE5C091C  mov         rdx,qword ptr [rsi+8]  
00007FFCDE5C0920  mov         eax,dword ptr [rdx+8]  
00007FFCDE5C0923  cmp         eax,ecx  
00007FFCDE5C0925  jbe         00007FFCDE5C0936  
00007FFCDE5C0927  movsxd      rax,ecx  
00007FFCDE5C092A  mov         qword ptr [rdx+rax*8+10h],rdi  
00007FFCDE5C092F  inc         ecx  
00007FFCDE5C0931  mov         dword ptr [rsi+10h],ecx  
00007FFCDE5C0934  jmp         00007FFCDE5C0941  
00007FFCDE5C0936  mov         rcx,rsi  
00007FFCDE5C0939  mov         rdx,rdi  
00007FFCDE5C093C  call        00007FFCDE5C02F0  
00007FFCDE5C0941  inc         rdi  
00007FFCDE5C0944  mov         rcx,qword ptr [rsi+8]  
00007FFCDE5C0948  mov         ecx,dword ptr [rcx+8]  
00007FFCDE5C094B  movsxd      rcx,ecx  
00007FFCDE5C094E  cmp         rcx,rdi  
00007FFCDE5C0951  jg          00007FFCDE5C0919  
; ...

Didn't test other targets.

this library is used on many previous versions of .NET Framework, right?

Is this indicated by

<SupportedFramework>net45;netcore45;netcoreapp1.0;wp8;wpa81;$(AllXamarinFrameworks)</SupportedFramework>

-- or how can I see on which targets a library will be used (in order to take this into account next time)?

So what to do?

  • if def the .NET Core path? Make code harder to maintain, and for the little perf-win I wouldn't do this.
  • Leave the code as is, close the PR?
  • ...?

@stephentoub
Copy link
Copy Markdown
Member

So what to do?

Thanks, @gfoidl. The potential improvement here is small and it complicates the code to do this kind of refactoring plus then to special-case it would require a lot more than just ifdef'ing (we'd need to ship multiple binaries, deal with the packaging, etc.). I appreciate your efforts here, but I think we should just leave it as-is.

@gfoidl gfoidl deleted the immutablearray-builder branch June 27, 2018 13:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants