RyuJIT SIMD: Poor code gen new Vector<byte>(b)

The `Vector<byte>(b)` ctor is non-inlinable and does a lot of extra work
e.g.

``` csharp
var v = new Vector<byte>(`\n`);
```

Won't inline

```
[0 IL=0003 TR=000004 06000022] [FAILED: too many il bytes] Vector`1:.ctor(ubyte):this
```

Which leads to the optimal method of using broadcast `byte` Vectors as creating them as statics and then passing them byref (such as in https://github.com/dotnet/coreclr/issues/7386) but then they aren't guaranteed to be aligned.

It generates the following asm

``` asm
**************** Inline Tree
Inlines into 06000022 Vector`1:.ctor(ubyte):this
Budget: initialTime=7566, finalTime=7566, initialBudget=75660, currentBudget=75660
Budget: initialSize=57176, finalSize=57176
; Assembly listing for method Vector`1:.ctor(ubyte):this
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 this         [V00,T03] (  4,   4  )   byref  ->  rcx         this
;  V01 arg1         [V01,T02] (  3,   6  )   ubyte  ->  rdx        
;  V02 loc0         [V02    ] (  3,   6  )   byref  ->  [rsp+0x20]   must-init pinned
;  V03 loc1         [V03,T00] (  5,  17  )     int  ->  rax        
;* V04 loc2         [V04    ] (  0,   0  )   byref  ->  zero-ref    pinned
;* V05 loc3         [V05    ] (  0,   0  )     int  ->  zero-ref   
;* V06 loc4         [V06    ] (  0,   0  )   byref  ->  zero-ref    pinned
;* V07 loc5         [V07    ] (  0,   0  )     int  ->  zero-ref   
;* V08 loc6         [V08    ] (  0,   0  )   byref  ->  zero-ref    pinned
;* V09 loc7         [V09    ] (  0,   0  )     int  ->  zero-ref   
;* V10 loc8         [V10    ] (  0,   0  )   byref  ->  zero-ref    pinned
;* V11 loc9         [V11    ] (  0,   0  )     int  ->  zero-ref   
;* V12 loc10        [V12    ] (  0,   0  )   byref  ->  zero-ref    pinned
;* V13 loc11        [V13    ] (  0,   0  )     int  ->  zero-ref   
;* V14 loc12        [V14    ] (  0,   0  )   byref  ->  zero-ref    pinned
;* V15 loc13        [V15    ] (  0,   0  )     int  ->  zero-ref   
;* V16 loc14        [V16    ] (  0,   0  )   byref  ->  zero-ref    pinned
;* V17 loc15        [V17    ] (  0,   0  )     int  ->  zero-ref   
;* V18 loc16        [V18    ] (  0,   0  )   byref  ->  zero-ref    pinned
;* V19 loc17        [V19    ] (  0,   0  )     int  ->  zero-ref   
;* V20 loc18        [V20    ] (  0,   0  )   byref  ->  zero-ref    pinned
;* V21 loc19        [V21    ] (  0,   0  )     int  ->  zero-ref   
;* V22 tmp0         [V22    ] (  0,   0  )     ref  ->  zero-ref   
;* V23 tmp1         [V23    ] (  0,   0  )     ref  ->  zero-ref   
;* V24 tmp2         [V24    ] (  0,   0  )     ref  ->  zero-ref   
;* V25 tmp3         [V25    ] (  0,   0  )     ref  ->  zero-ref   
;* V26 tmp4         [V26    ] (  0,   0  )     ref  ->  zero-ref   
;* V27 tmp5         [V27    ] (  0,   0  )     ref  ->  zero-ref   
;* V28 tmp6         [V28    ] (  0,   0  )     ref  ->  zero-ref   
;* V29 tmp7         [V29    ] (  0,   0  )     ref  ->  zero-ref   
;* V30 tmp8         [V30    ] (  0,   0  )     ref  ->  zero-ref   
;* V31 tmp9         [V31    ] (  0,   0  )     ref  ->  zero-ref   
;  V32 tmp10        [V32,T01] (  2,  16  )    long  ->  rcx        
;* V33 tmp11        [V33    ] (  0,   0  )    long  ->  zero-ref   
;* V34 tmp12        [V34    ] (  0,   0  )    long  ->  zero-ref   
;* V35 tmp13        [V35    ] (  0,   0  )    long  ->  zero-ref   
;* V36 tmp14        [V36    ] (  0,   0  )    long  ->  zero-ref   
;* V37 tmp15        [V37    ] (  0,   0  )    long  ->  zero-ref   
;* V38 tmp16        [V38    ] (  0,   0  )    long  ->  zero-ref   
;* V39 tmp17        [V39    ] (  0,   0  )    long  ->  zero-ref   
;* V40 tmp18        [V40    ] (  0,   0  )    long  ->  zero-ref   
;* V41 tmp19        [V41    ] (  0,   0  )    long  ->  zero-ref   
;  V42 OutArgs      [V42    ] (  1,   1  )  lclBlk (32) [rsp+0x00]  
;
; Lcl frame size = 40

G_M27877_IG01:
       4883EC28             sub      rsp, 40
       33C0                 xor      rax, rax
       4889442420           mov      qword ptr [rsp+20H], rax

G_M27877_IG02:
       33C0                 xor      rax, rax
       660F57C0             xorpd    xmm0, xmm0
       F30F7F01             movdqu   qword ptr [rcx], xmm0
       48894C2420           mov      bword ptr [rsp+20H], rcx

G_M27877_IG03:
       488B4C2420           mov      rcx, bword ptr [rsp+20H]
       4C63C0               movsxd   r8, eax
       42881401             mov      byte  ptr [rcx+r8], dl
       FFC0                 inc      eax
       83F810               cmp      eax, 16
       7CED                 jl       SHORT G_M27877_IG03

G_M27877_IG04:
       33C0                 xor      rax, rax
       4889442420           mov      bword ptr [rsp+20H], rax

G_M27877_IG05:
       4883C428             add      rsp, 40
       C3                   ret      

; Total bytes of code 57, prolog size 11 for method Vector`1:.ctor(ubyte):this
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RyuJIT SIMD: Poor code gen new Vector<byte>(b) #6757

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

RyuJIT SIMD: Poor code gen new Vector<byte>(b) #6757

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions