Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Faster Span#16222

Closed
benaadams wants to merge 4 commits into
dotnet:masterfrom
benaadams:span-slim
Closed

Faster Span#16222
benaadams wants to merge 4 commits into
dotnet:masterfrom
benaadams:span-slim

Conversation

@benaadams
Copy link
Copy Markdown
Member

@benaadams benaadams commented Feb 16, 2017

Slim the span hot methods .ctor+Slice as they are very hot and always inlined

Vectorized IndexOf for byte and SequenceEqual for byte and char

/cc @jkotas

// Check will be Jitted out for ValueTypes
if (SpanHelpers.IsReference<T>() && array.GetType() != typeof(T[]))
ThrowHelper.ThrowArrayTypeMismatchException_ArrayTypeMustBeExactMatch(typeof(T));
if ((uint)start > (uint)array.Length || (uint)length > (uint)(array.Length - start))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does changing ((uint)start > (uint)array.Length || (uint)length > (uint)(array.Length - start)) to (start | length) < 0 || array.Length - start < length) really help? What is the code before/after?

Copy link
Copy Markdown
Member Author

@benaadams benaadams Feb 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, does a little extra 😢

Before

G_M5589_IG02:
       4D85C0               test     r8, r8
       0F8400000000         je       G_M5589_IG05
       458B5008             mov      r10d, dword ptr [r8+8]
       453BD1               cmp      r10d, r9d
       0F8200000000         jb       G_M5589_IG05
       452BD1               sub      r10d, r9d
       443BD0               cmp      r10d, eax
       0F8200000000         jb       G_M5589_IG05

After

G_M5589_IG02:
       4D85C0               test     r8, r8
       0F8400000000         je       G_M5589_IG05
       458BD1               mov      r10d, r9d
       440BD0               or       r10d, eax
       4585D2               test     r10d, r10d
       0F8C00000000         jl       G_M5589_IG05
       458B5008             mov      r10d, dword ptr [r8+8]
       452BD1               sub      r10d, r9d
       443BD0               cmp      r10d, eax
       0F8C00000000         jl       G_M5589_IG05 

Will revert

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I did notice something strange about the rest of the coreclr method https://github.com/dotnet/coreclr/issues/9633

Copy link
Copy Markdown

@ahsonkhan ahsonkhan Feb 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change reverted, is the crux of the change here changing default(T)== null to

if (SpanHelpers.IsReference<T>() && array.GetType() != typeof(T[])) 

?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, its merging two exception throws into one and reducing the arguments passed

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cutting the function calls; and exit points, that are brought in with the inline


public static bool IsReferenceCore(Type type)
{
if (type.GetTypeInfo().IsPrimitive) // This is hopefully the common case. All types that return true for this are value types w/out embedded references.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt that this is saving anything. It can just return type.GetTypeInfo().IsValueType.

//
public static readonly bool IsReferenceOrContainsReferences = IsReferenceOrContainsReferencesCore(typeof(T));

public static readonly bool IsReference = IsReferenceCore(typeof(T));
Copy link
Copy Markdown
Member

@jkotas jkotas Feb 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - why not call it IsValueType to make same as the reflection API?

public static readonly bool IsValueType = typeof(T).GetTypeInfo().IsValueType;

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Feb 16, 2017

@ahsonkhan Once this goes in, we will want to port the applicable changes to the fast span in CoreCLR as well.

Comment thread src/System.Memory/src/System/Span.cs Outdated
if (length == 0)
return;

// Branch will be Jit elminated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: spelling.

@mmitche
Copy link
Copy Markdown
Member

mmitche commented Feb 16, 2017

@dotnet-bot test this please (Feeds live again)

@@ -105,72 +105,67 @@ public static bool SequenceEqual(ref byte first, ref byte second, int length)
int index = 0;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the param is by ref ref byte first is it safe to use AsPointer?
e.g. Unsafe.AsPointer(ref first);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would have to pin it first. Unsafe.AsPointer is safe to use only if you know that the ref is pinned.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit I expected a compile error from fixed (byte* pSearchSpace = &searchSpace) but it works 😄

@benaadams benaadams changed the title Span slim Faster Span Feb 20, 2017
@benaadams benaadams force-pushed the span-slim branch 4 times, most recently from aaa5f6b to 186d3e6 Compare February 20, 2017 14:42
@benaadams
Copy link
Copy Markdown
Member Author

Vectorized IndexOf for byte and SequenceEqual for byte and char

Pre

<details>

```asm
 ; Total bytes of code 38, prolog size 4 for method SpanExtensions:SequenceEqual(struct,struct):bool
  ; ============================================================
  Successfully inlined Unsafe:AreSame(byref,byref):bool (5 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Add(byref,int):byref (12 IL bytes) (depth 1) [aggressive inline attribute]
  **************** Inline Tree
  Inlines into 06000018 SpanHelpers:SequenceEqual(byref,byref,int):bool
    [1 IL=0002 TR=000003 06000019] [aggressive inline attribute] Unsafe:AreSame(byref,byref):bool
    [2 IL=0020 TR=000026 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [3 IL=0028 TR=000034 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [4 IL=0044 TR=000056 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [5 IL=0052 TR=000064 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [6 IL=0068 TR=000086 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [7 IL=0076 TR=000094 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [8 IL=0092 TR=000116 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [9 IL=0100 TR=000124 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [10 IL=0116 TR=000146 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [11 IL=0124 TR=000154 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [12 IL=0140 TR=000176 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [13 IL=0148 TR=000184 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [14 IL=0164 TR=000206 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [15 IL=0172 TR=000214 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [16 IL=0188 TR=000236 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [17 IL=0196 TR=000244 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [18 IL=0226 TR=000312 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [19 IL=0234 TR=000320 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [20 IL=0250 TR=000342 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [21 IL=0258 TR=000350 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [22 IL=0274 TR=000372 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [23 IL=0282 TR=000380 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [24 IL=0298 TR=000402 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [25 IL=0306 TR=000410 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [26 IL=0333 TR=000462 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
    [27 IL=0341 TR=000470 06000012] [aggressive inline attribute] Unsafe:Add(byref,int):byref
  Budget: initialTime=1158, finalTime=1414, initialBudget=11580, currentBudget=11840
  Budget: increased by 260 because of force inlines
  Budget: initialSize=8476, finalSize=8476
  ; Assembly listing for method SpanHelpers:SequenceEqual(byref,byref,int):bool
  ; Emitting BLENDED_CODE for X64 CPU with SSE2
  ; optimized code
  ; rsp based frame
  ; fully interruptible
  ; Final local variable assignments
  ;
  ;  V00 arg0         [V00,T01] ( 16,  55  )   byref  ->  rcx
  ;  V01 arg1         [V01,T02] ( 16,  55  )   byref  ->  rdx
  ;  V02 arg2         [V02,T03] ( 14,  39.5)     int  ->   r8
  ;  V03 loc0         [V03,T00] ( 53, 208.5)     int  ->   r9
  ;  V04 tmp0         [V04,T05] (  2,  16  )     int  ->  rax
  ;  V05 tmp1         [V05,T06] (  2,  16  )     int  ->  rax
  ;  V06 tmp2         [V06,T07] (  2,  16  )     int  ->  rax
  ;  V07 tmp3         [V07,T08] (  2,  16  )     int  ->  rax
  ;  V08 tmp4         [V08,T09] (  2,  16  )     int  ->  rax
  ;  V09 tmp5         [V09,T10] (  2,  16  )     int  ->  rax
  ;  V10 tmp6         [V10,T11] (  2,  16  )     int  ->  rax
  ;  V11 tmp7         [V11,T12] (  2,  16  )     int  ->  rax
  ;  V12 tmp8         [V12,T13] (  2,  16  )     int  ->  rax
  ;  V13 tmp9         [V13,T14] (  2,  16  )     int  ->  rax
  ;  V14 tmp10        [V14,T15] (  2,  16  )     int  ->  rax
  ;  V15 tmp11        [V15,T16] (  2,  16  )     int  ->  rax
  ;  V16 tmp12        [V16,T17] (  2,  16  )     int  ->  rax
  ;  V17 tmp13        [V17,T04] ( 16,  17  )     int  ->  rax
  ;# V18 OutArgs      [V18    ] (  1,   1  )  lclBlk ( 0) [rsp+0x00]
  ;
  ; Lcl frame size = 0

  G_M26133_IG01:

  G_M26133_IG02:
         483BCA               cmp      rcx, rdx
         750A                 jne      SHORT G_M26133_IG03
         B801000000           mov      eax, 1
         E9D9010000           jmp      G_M26133_IG23

  G_M26133_IG03:
         4533C9               xor      r9d, r9d
         4183F808             cmp      r8d, 8
         0F8C0E010000         jl       G_M26133_IG13

  G_M26133_IG04:
         4963C1               movsxd   rax, r9d
         0FB60401             movzx    rax, byte  ptr [rcx+rax]
         4D63D1               movsxd   r10, r9d
         460FB61412           movzx    r10, byte  ptr [rdx+r10]
         443BD0               cmp      r10d, eax
         7409                 je       SHORT G_M26133_IG05
         33C0                 xor      eax, eax
         E9B1010000           jmp      G_M26133_IG23

  G_M26133_IG05:
         41FFC1               inc      r9d
         4963C1               movsxd   rax, r9d
         0FB60401             movzx    rax, byte  ptr [rcx+rax]
         4D63D1               movsxd   r10, r9d
         460FB61412           movzx    r10, byte  ptr [rdx+r10]
         443BD0               cmp      r10d, eax
         7409                 je       SHORT G_M26133_IG06
         33C0                 xor      eax, eax
         E991010000           jmp      G_M26133_IG23

  G_M26133_IG06:
         41FFC1               inc      r9d
         4963C1               movsxd   rax, r9d
         0FB60401             movzx    rax, byte  ptr [rcx+rax]
         4D63D1               movsxd   r10, r9d
         460FB61412           movzx    r10, byte  ptr [rdx+r10]
         443BD0               cmp      r10d, eax
         7409                 je       SHORT G_M26133_IG07
         33C0                 xor      eax, eax
         E971010000           jmp      G_M26133_IG23

  G_M26133_IG07:
         41FFC1               inc      r9d
         4963C1               movsxd   rax, r9d
         0FB60401             movzx    rax, byte  ptr [rcx+rax]
         4D63D1               movsxd   r10, r9d
         460FB61412           movzx    r10, byte  ptr [rdx+r10]
         443BD0               cmp      r10d, eax
         7409                 je       SHORT G_M26133_IG08
         33C0                 xor      eax, eax
         E951010000           jmp      G_M26133_IG23

  G_M26133_IG08:
         41FFC1               inc      r9d
         4963C1               movsxd   rax, r9d
         0FB60401             movzx    rax, byte  ptr [rcx+rax]
         4D63D1               movsxd   r10, r9d
         460FB61412           movzx    r10, byte  ptr [rdx+r10]
         443BD0               cmp      r10d, eax
         7409                 je       SHORT G_M26133_IG09
         33C0                 xor      eax, eax
         E931010000           jmp      G_M26133_IG23

  G_M26133_IG09:
         41FFC1               inc      r9d
         4963C1               movsxd   rax, r9d
         0FB60401             movzx    rax, byte  ptr [rcx+rax]
         4D63D1               movsxd   r10, r9d
         460FB61412           movzx    r10, byte  ptr [rdx+r10]
         443BD0               cmp      r10d, eax
         7409                 je       SHORT G_M26133_IG10
         33C0                 xor      eax, eax
         E911010000           jmp      G_M26133_IG23

  G_M26133_IG10:
         41FFC1               inc      r9d
         4963C1               movsxd   rax, r9d
         0FB60401             movzx    rax, byte  ptr [rcx+rax]
         4D63D1               movsxd   r10, r9d
         460FB61412           movzx    r10, byte  ptr [rdx+r10]
         443BD0               cmp      r10d, eax
         7409                 je       SHORT G_M26133_IG11
         33C0                 xor      eax, eax
         E9F1000000           jmp      G_M26133_IG23

  G_M26133_IG11:
         41FFC1               inc      r9d
         4963C1               movsxd   rax, r9d
         0FB60401             movzx    rax, byte  ptr [rcx+rax]
         4D63D1               movsxd   r10, r9d
         460FB61412           movzx    r10, byte  ptr [rdx+r10]
         443BD0               cmp      r10d, eax
         7409                 je       SHORT G_M26133_IG12
         33C0                 xor      eax, eax
         E9D1000000           jmp      G_M26133_IG23

  G_M26133_IG12:
         41FFC1               inc      r9d
         4183C0F8             add      r8d, -8
         4183F808             cmp      r8d, 8
         0F8D02FFFFFF         jge      G_M26133_IG04

  G_M26133_IG13:
         4183F804             cmp      r8d, 4
         0F8C85000000         jl       G_M26133_IG19

  G_M26133_IG14:
         4963C1               movsxd   rax, r9d
         0FB60401             movzx    rax, byte  ptr [rcx+rax]
         4D63D1               movsxd   r10, r9d
         460FB61412           movzx    r10, byte  ptr [rdx+r10]
         443BD0               cmp      r10d, eax
         7409                 je       SHORT G_M26133_IG15
         33C0                 xor      eax, eax
         E999000000           jmp      G_M26133_IG23

  G_M26133_IG15:
         41FFC1               inc      r9d
         4963C1               movsxd   rax, r9d
         0FB60401             movzx    rax, byte  ptr [rcx+rax]
         4D63D1               movsxd   r10, r9d
         460FB61412           movzx    r10, byte  ptr [rdx+r10]
         443BD0               cmp      r10d, eax
         7406                 je       SHORT G_M26133_IG16
         33C0                 xor      eax, eax
         EB7C                 jmp      SHORT G_M26133_IG23

  G_M26133_IG16:
         41FFC1               inc      r9d
         4963C1               movsxd   rax, r9d
         0FB60401             movzx    rax, byte  ptr [rcx+rax]
         4D63D1               movsxd   r10, r9d
         460FB61412           movzx    r10, byte  ptr [rdx+r10]
         443BD0               cmp      r10d, eax
         7406                 je       SHORT G_M26133_IG17
         33C0                 xor      eax, eax
         EB5F                 jmp      SHORT G_M26133_IG23

  G_M26133_IG17:
         41FFC1               inc      r9d
         4963C1               movsxd   rax, r9d
         0FB60401             movzx    rax, byte  ptr [rcx+rax]
         4D63D1               movsxd   r10, r9d
         460FB61412           movzx    r10, byte  ptr [rdx+r10]
         443BD0               cmp      r10d, eax
         7406                 je       SHORT G_M26133_IG18
         33C0                 xor      eax, eax
         EB42                 jmp      SHORT G_M26133_IG23

  G_M26133_IG18:
         41FFC1               inc      r9d
         4183C0FC             add      r8d, -4
         4183F804             cmp      r8d, 4
         7D87                 jge      SHORT G_M26133_IG14

  G_M26133_IG19:
         4585C0               test     r8d, r8d
         7E25                 jle      SHORT G_M26133_IG22

  G_M26133_IG20:
         4963C1               movsxd   rax, r9d
         0FB60401             movzx    rax, byte  ptr [rcx+rax]
         4D63D1               movsxd   r10, r9d
         460FB61412           movzx    r10, byte  ptr [rdx+r10]
         443BD0               cmp      r10d, eax
         7406                 je       SHORT G_M26133_IG21
         33C0                 xor      eax, eax
         EB12                 jmp      SHORT G_M26133_IG23

  G_M26133_IG21:
         41FFC1               inc      r9d
         41FFC8               dec      r8d
         4585C0               test     r8d, r8d
         7FDD                 jg       SHORT G_M26133_IG20

  G_M26133_IG22:
         B801000000           mov      eax, 1

  G_M26133_IG23:
         0FB6C0               movzx    rax, al

  G_M26133_IG24:
         C3                   ret

  ; Total bytes of code 462, prolog size 0 for method SpanHelpers:SequenceEqual(byref,byref,int):bool
```

</details>

Post

<details>

```asm
Successfully inlined Unsafe:Read(long):struct (7 IL bytes) (depth 1) [aggressive inline attribute]
  Successfully inlined Unsafe:Read(long):struct (7 IL bytes) (depth 1) [aggressive inline attribute]
  **************** Inline Tree
  Inlines into 06000018 SpanHelpers:SequenceEqual(byref,byref,int):bool
    [1 IL=0043 TR=000071 06000001] [aggressive inline attribute] Unsafe:Read(long):struct
    [2 IL=0055 TR=000083 06000001] [aggressive inline attribute] Unsafe:Read(long):struct
  Budget: initialTime=714, finalTime=714, initialBudget=7140, currentBudget=7140
  Budget: initialSize=5101, finalSize=5101
  ; Assembly listing for method SpanHelpers:SequenceEqual(byref,byref,int):bool
  ; Emitting BLENDED_CODE for X64 CPU with SSE2
  ; optimized code
  ; rsp based frame
  ; fully interruptible
  ; Final local variable assignments
  ;
  ;  V00 arg0         [V00,T10] (  4,   3  )   byref  ->  rcx
  ;  V01 arg1         [V01,T11] (  4,   3  )   byref  ->  rdx
  ;  V02 arg2         [V02,T09] ( 11,   7  )     int  ->   r8
  ;  V03 loc0         [V03,T12] (  7,   4.5)    bool  ->  rax
  ;  V04 loc1         [V04    ] (  2,   1  )   byref  ->  [rsp+0x08]   must-init pinned
  ;  V05 loc2         [V05    ] (  2,   1  )   byref  ->  [rsp+0x00]   must-init pinned
  ;  V06 loc3         [V06,T03] ( 10,  22.5)    long  ->  rcx
  ;  V07 loc4         [V07,T04] ( 10,  22.5)    long  ->  rdx
  ;  V08 loc5         [V08,T00] ( 31,  64.5)     int  ->   r9
  ;  V09 loc6         [V09,T05] (  3,  12  )  simd16  ->  mm0         ld-addr-op
  ;  V10 loc7         [V10,T06] (  3,  12  )  simd16  ->  mm1
  ;* V11 tmp0         [V11    ] (  0,   0  )    long  ->  zero-ref
  ;* V12 tmp1         [V12    ] (  0,   0  )    long  ->  zero-ref
  ;  V13 tmp2         [V13,T13] (  2,   2  )    long  ->  rcx
  ;  V14 tmp3         [V14,T14] (  2,   2  )    long  ->  rdx
  ;# V15 OutArgs      [V15    ] (  1,   1  )  lclBlk ( 0) [rsp+0x00]
  ;  V16 cse0         [V16,T07] (  8,  11  )     int  ->  r10
  ;  V17 cse1         [V17,T08] (  6,  10  )     int  ->  r10
  ;  V18 cse2         [V18,T02] (  6,  24  )    long  ->  r11
  ;  V19 cse3         [V19,T01] (  8,  32  )    long  ->  r11
  ;
  ; Lcl frame size = 16

  G_M26108_IG01:
         56                   push     rsi
         4883EC10             sub      rsp, 16
         33C0                 xor      rax, rax
         4889442408           mov      qword ptr [rsp+08H], rax
         48890424             mov      qword ptr [rsp], rax

  G_M26108_IG02:
         B801000000           mov      eax, 1
         4585C0               test     r8d, r8d
         0F84E2000000         je       G_M26108_IG14
         48894C2408           mov      bword ptr [rsp+08H], rcx
         48891424             mov      bword ptr [rsp], rdx
         483BCA               cmp      rcx, rdx
         0F84C5000000         je       G_M26108_IG13
         4533C9               xor      r9d, r9d
         458D50F0             lea      r10d, [r8-16]
         4585D2               test     r10d, r10d
         7C33                 jl       SHORT G_M26108_IG05

  G_M26108_IG03:
         4D63D9               movsxd   r11, r9d
         420F100419           movups   xmm0, xmmword ptr [rcx+r11]
         420F100C1A           movups   xmm1, xmmword ptr [rdx+r11]
         4183C110             add      r9d, 16
         0F28D0               movaps   xmm2, xmm0
         660F76D1             pcmpeqd  xmm2, xmm1
         66440FD7DA           pmovmskb  r11d, xmm2
         4181FBFFFF0000       cmp      r11d, 0xFFFF
         7408                 je       SHORT G_M26108_IG04
         33C0                 xor      eax, eax
         E98C000000           jmp      G_M26108_IG13

  G_M26108_IG04:
         453BD1               cmp      r10d, r9d
         7DCE                 jge      SHORT G_M26108_IG03

  G_M26108_IG05:
         458D50F8             lea      r10d, [r8-8]
         453BD1               cmp      r10d, r9d
         7C1B                 jl       SHORT G_M26108_IG08

  G_M26108_IG06:
         4D63D9               movsxd   r11, r9d
         4A8B3419             mov      rsi, qword ptr [rcx+r11]
         4A3B341A             cmp      rsi, qword ptr [rdx+r11]
         7405                 je       SHORT G_M26108_IG07
         33C0                 xor      eax, eax
         EB6C                 jmp      SHORT G_M26108_IG13

  G_M26108_IG07:
         4183C108             add      r9d, 8
         453BD1               cmp      r10d, r9d
         7DE6                 jge      SHORT G_M26108_IG06

  G_M26108_IG08:
         458D50FC             lea      r10d, [r8-4]
         453BD1               cmp      r10d, r9d
         7C1A                 jl       SHORT G_M26108_IG10
         4D63D1               movsxd   r10, r9d
         468B1411             mov      r10d, dword ptr [rcx+r10]
         4D63D9               movsxd   r11, r9d
         463B141A             cmp      r10d, dword ptr [rdx+r11]
         7406                 je       SHORT G_M26108_IG09
         33C0                 xor      eax, eax
         EB45                 jmp      SHORT G_M26108_IG13

  G_M26108_IG09:
         4183C104             add      r9d, 4

  G_M26108_IG10:
         458D50FE             lea      r10d, [r8-2]
         453BD1               cmp      r10d, r9d
         7C1C                 jl       SHORT G_M26108_IG12
         4D63D1               movsxd   r10, r9d
         4E0FBF1411           movsx    r10, word  ptr [rcx+r10]
         4D63D9               movsxd   r11, r9d
         66463B141A           cmp      r10w, word  ptr [rdx+r11]
         7406                 je       SHORT G_M26108_IG11
         33C0                 xor      eax, eax
         EB20                 jmp      SHORT G_M26108_IG13

  G_M26108_IG11:
         4183C102             add      r9d, 2

  G_M26108_IG12:
         453BC1               cmp      r8d, r9d
         7E15                 jle      SHORT G_M26108_IG13
         4D63C1               movsxd   r8, r9d
         420FB60C01           movzx    rcx, byte  ptr [rcx+r8]
         4D63C1               movsxd   r8, r9d
         423A0C02             cmp      cl, byte  ptr [rdx+r8]
         7404                 je       SHORT G_M26108_IG13
         33C0                 xor      eax, eax

  G_M26108_IG13:
         33D2                 xor      rdx, rdx
         48891424             mov      bword ptr [rsp], rdx
         4889542408           mov      bword ptr [rsp+08H], rdx

  G_M26108_IG14:
         4883C410             add      rsp, 16
         5E                   pop      rsi
         C3                   ret

  ; Total bytes of code 254, prolog size 16 for method SpanHelpers:SequenceEqual(byref,byref,int):bool
```

</details>
@benaadams
Copy link
Copy Markdown
Member Author

Can't seem to get the project references right for System.Numerics.Vectors as System.Memory wants a netstandard ref but it only has netfx,uap,netcoreapp

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Feb 20, 2017

I think you will want to avoid any netstandard references in netcoreapp build.

Related to #16244 (comment)

private static bool IsReferenceOrContainsReferencesCore(Type type)
{
if (type.GetTypeInfo().IsPrimitive) // This is hopefully the common case. All types that return true for this are value types w/out embedded references.
if (type.IsPrimitive) // This is hopefully the common case. All types that return true for this are value types w/out embedded references.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not build in the netstandard1.0 build configuration

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

benaadams added a commit to benaadams/corefxlab that referenced this pull request Feb 21, 2017
KrzysztofCwalina pushed a commit to dotnet/corefxlab that referenced this pull request Feb 21, 2017
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int IndexOf(this Span<byte> span, byte value)
{
if (!BitConverter.IsLittleEndian)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we care about endianness here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The locate byte doesn't work on big-endian as it highlights the wrong byte (due to order reversal) aspnet/KestrelHttpServer#1138 (comment); don't know if its worth the effort to come up with something that does work on big-endian though /cc @halter73

Copy link
Copy Markdown
Member

@KrzysztofCwalina KrzysztofCwalina Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. Do you know how much this branch adds (if any) to "early" searches? i.e. search that finds the byte at index 0.

Copy link
Copy Markdown
Member Author

@benaadams benaadams Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may suffer from a static base initalization check if its first use; and not do branch elimination. Subsequent uses (in different function) will branch eliminate.

@jkotas would there me any mileage in adding a use of BitConverter.IsLittleEndian to something like CommonlyUsedGenericInstantiations as they are runtime fixed and mainly used for branch elimination; so the check is pre-paid?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already issue for it https://github.com/dotnet/coreclr/issues/9701 serendipity 😺

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added comment to dotnet/coreclr#9701 on what would be needed to fix it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 For keeping a big-endian path working. It would be nice if we could test on big-endian architectures.

In Kestrel, we declared big-endian bankruptcy a while ago, so Kestrel now throws on startup if it detects that it's running on a big-endian machine. I don't think that's really an option for corefx APIs.

var a = pFirst;
var b = pSecond;

if (a == b)
Copy link
Copy Markdown
Member

@KrzysztofCwalina KrzysztofCwalina Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we call values related to the first parameter: first, a, and v0. It would be better if the names were all related, e.b. first, pFirst, cFirst ("cursor to first"), and vFirst ("value from first").

if (Unsafe.Add(ref first, index) != Unsafe.Add(ref second, index))
return false;
index++;
fixed (byte* pHaystack = &searchSpace)
Copy link
Copy Markdown
Member

@KrzysztofCwalina KrzysztofCwalina Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call it pSearchSpace?

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static Vector<byte> GetVector(byte vectorByte)
{
#if !NETCOREAPP1_2
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will that work with System.Memory package? i.e. do we now need three different dlls in the package: one for below 1.2, one for 1.2, and one for 2.0?

cc: @ahsonkhan

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming NETCOREAPP1_2 would be defined for NETCOREAPP2_0? Or could set it to NETCOREAPP2_0. The difference between the two is a small improvement; so it doesn't matter hugley if it uses the other path; however new Vector<byte>(...) where the jit doesn't have the fix is quite a major regression.

Copy link
Copy Markdown
Member

@jkotas jkotas Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming NETCOREAPP1_2 would be defined for NETCOREAPP2_0

It should be netcoreapp (without version). Define naming convention in https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/project-guidelines.md

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its only valid with the coreclr jit > 1.2 though?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, netcoreapp means latest .NET Core


if (a == b)
{
goto exitFixed;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just return isMatch here instead of having a goto?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bloats the asm by inserting all the undo fixed asm and stack pops in place, for a rare case (both terms are the identical memory location); goto keeps it nice and tight. Using goto out of fixed equally does a lot of code bloat; which is why two labels. I see this as being a very hot method so trying to minimise instructions.

Will post before and after asm.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @ EdsgerDijkstra

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jit converts every thing to gotos anyway; just helping it along 😉

@pakrym
Copy link
Copy Markdown

pakrym commented Feb 22, 2017

@benaadams @davidfowl @KrzysztofCwalina @halter73 Do we want IndexOfAny(byte b1,..) as part of this PR? We would need to duplicate this code in Kestrel/Pipelines if we won't have it in core.

@benaadams
Copy link
Copy Markdown
Member Author

@pakrym I'm doing some experiments in corefxlab; might be better to churn there for a bit; then bring forward changes to corefx?

@karelz
Copy link
Copy Markdown
Member

karelz commented Mar 30, 2017

@benaadams what are your plans with this PR here? Did you move your work to corefxlab?
I don't see progress on this PR for last 1.5 month, so I wonder what to do with it ...

@benaadams
Copy link
Copy Markdown
Member Author

Think the best bits have returned to corefx by a circular route :)

@benaadams benaadams closed this Mar 30, 2017
@karelz
Copy link
Copy Markdown
Member

karelz commented Mar 30, 2017

Thanks @benaadams!

@karelz karelz modified the milestone: 2.0.0 Mar 31, 2017
@benaadams benaadams deleted the span-slim branch January 11, 2019 21:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.