The Vector<byte>(b) ctor is non-inlinable and does a lot of extra work
e.g.
var v = new Vector<byte>(`\n`);
Won't inline
[0 IL=0003 TR=000004 06000022] [FAILED: too many il bytes] Vector`1:.ctor(ubyte):this
Which leads to the optimal method of using broadcast byte Vectors as creating them as statics and then passing them byref (such as in https://github.com/dotnet/coreclr/issues/7386) but then they aren't guaranteed to be aligned.
It generates the following asm
**************** Inline Tree
Inlines into 06000022 Vector`1:.ctor(ubyte):this
Budget: initialTime=7566, finalTime=7566, initialBudget=75660, currentBudget=75660
Budget: initialSize=57176, finalSize=57176
; Assembly listing for method Vector`1:.ctor(ubyte):this
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
; V00 this [V00,T03] ( 4, 4 ) byref -> rcx this
; V01 arg1 [V01,T02] ( 3, 6 ) ubyte -> rdx
; V02 loc0 [V02 ] ( 3, 6 ) byref -> [rsp+0x20] must-init pinned
; V03 loc1 [V03,T00] ( 5, 17 ) int -> rax
;* V04 loc2 [V04 ] ( 0, 0 ) byref -> zero-ref pinned
;* V05 loc3 [V05 ] ( 0, 0 ) int -> zero-ref
;* V06 loc4 [V06 ] ( 0, 0 ) byref -> zero-ref pinned
;* V07 loc5 [V07 ] ( 0, 0 ) int -> zero-ref
;* V08 loc6 [V08 ] ( 0, 0 ) byref -> zero-ref pinned
;* V09 loc7 [V09 ] ( 0, 0 ) int -> zero-ref
;* V10 loc8 [V10 ] ( 0, 0 ) byref -> zero-ref pinned
;* V11 loc9 [V11 ] ( 0, 0 ) int -> zero-ref
;* V12 loc10 [V12 ] ( 0, 0 ) byref -> zero-ref pinned
;* V13 loc11 [V13 ] ( 0, 0 ) int -> zero-ref
;* V14 loc12 [V14 ] ( 0, 0 ) byref -> zero-ref pinned
;* V15 loc13 [V15 ] ( 0, 0 ) int -> zero-ref
;* V16 loc14 [V16 ] ( 0, 0 ) byref -> zero-ref pinned
;* V17 loc15 [V17 ] ( 0, 0 ) int -> zero-ref
;* V18 loc16 [V18 ] ( 0, 0 ) byref -> zero-ref pinned
;* V19 loc17 [V19 ] ( 0, 0 ) int -> zero-ref
;* V20 loc18 [V20 ] ( 0, 0 ) byref -> zero-ref pinned
;* V21 loc19 [V21 ] ( 0, 0 ) int -> zero-ref
;* V22 tmp0 [V22 ] ( 0, 0 ) ref -> zero-ref
;* V23 tmp1 [V23 ] ( 0, 0 ) ref -> zero-ref
;* V24 tmp2 [V24 ] ( 0, 0 ) ref -> zero-ref
;* V25 tmp3 [V25 ] ( 0, 0 ) ref -> zero-ref
;* V26 tmp4 [V26 ] ( 0, 0 ) ref -> zero-ref
;* V27 tmp5 [V27 ] ( 0, 0 ) ref -> zero-ref
;* V28 tmp6 [V28 ] ( 0, 0 ) ref -> zero-ref
;* V29 tmp7 [V29 ] ( 0, 0 ) ref -> zero-ref
;* V30 tmp8 [V30 ] ( 0, 0 ) ref -> zero-ref
;* V31 tmp9 [V31 ] ( 0, 0 ) ref -> zero-ref
; V32 tmp10 [V32,T01] ( 2, 16 ) long -> rcx
;* V33 tmp11 [V33 ] ( 0, 0 ) long -> zero-ref
;* V34 tmp12 [V34 ] ( 0, 0 ) long -> zero-ref
;* V35 tmp13 [V35 ] ( 0, 0 ) long -> zero-ref
;* V36 tmp14 [V36 ] ( 0, 0 ) long -> zero-ref
;* V37 tmp15 [V37 ] ( 0, 0 ) long -> zero-ref
;* V38 tmp16 [V38 ] ( 0, 0 ) long -> zero-ref
;* V39 tmp17 [V39 ] ( 0, 0 ) long -> zero-ref
;* V40 tmp18 [V40 ] ( 0, 0 ) long -> zero-ref
;* V41 tmp19 [V41 ] ( 0, 0 ) long -> zero-ref
; V42 OutArgs [V42 ] ( 1, 1 ) lclBlk (32) [rsp+0x00]
;
; Lcl frame size = 40
G_M27877_IG01:
4883EC28 sub rsp, 40
33C0 xor rax, rax
4889442420 mov qword ptr [rsp+20H], rax
G_M27877_IG02:
33C0 xor rax, rax
660F57C0 xorpd xmm0, xmm0
F30F7F01 movdqu qword ptr [rcx], xmm0
48894C2420 mov bword ptr [rsp+20H], rcx
G_M27877_IG03:
488B4C2420 mov rcx, bword ptr [rsp+20H]
4C63C0 movsxd r8, eax
42881401 mov byte ptr [rcx+r8], dl
FFC0 inc eax
83F810 cmp eax, 16
7CED jl SHORT G_M27877_IG03
G_M27877_IG04:
33C0 xor rax, rax
4889442420 mov bword ptr [rsp+20H], rax
G_M27877_IG05:
4883C428 add rsp, 40
C3 ret
; Total bytes of code 57, prolog size 11 for method Vector`1:.ctor(ubyte):this
The
Vector<byte>(b)ctor is non-inlinable and does a lot of extra worke.g.
Won't inline
Which leads to the optimal method of using broadcast
byteVectors as creating them as statics and then passing them byref (such as in https://github.com/dotnet/coreclr/issues/7386) but then they aren't guaranteed to be aligned.It generates the following asm