Skip to content

Add lane construction and composition APIs#127690

Open
hez2010 wants to merge 23 commits intodotnet:mainfrom
hez2010:vector-lanes
Open

Add lane construction and composition APIs#127690
hez2010 wants to merge 23 commits intodotnet:mainfrom
hez2010:vector-lanes

Conversation

@hez2010
Copy link
Copy Markdown
Contributor

@hez2010 hez2010 commented May 3, 2026

This PR adds lane construction and composition APIs approved in #122557, and the corresponding JIT intrinsics.

The JIT now recognizes the new vector APIs and expands them using existing SIMD nodes. The managed implementation allows decomposition through smaller vector widths when wider hardware support is unavailable.

The xarch lowering uses fixed shuffle forms where profitable:

  • vpbroadcast* for sequence and alternating construction
  • vshufps for 128-bit concat/unzip patterns
  • full-width unpack plus vperm2i128 for 256-bit zip/unzip
  • fixed immediate shuffles for reverse

The ARM64 lowering avoids table-lookup forms for small fixed concat/reverse operations and uses direct element moves where applicable, such as ins and rev64.

CreateCauchySequence requires constant folding sqrt in the JIT to produce optimal code, but I would like to leave it for now as it's out-of-scope for this PR.

Codegen:

Vector128
; Assembly listing for method Tests:Geo128(int):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vpbroadcastd xmm0, edx
       vpmulld  xmm0, xmm0, xmmword ptr [reloc @RWD00]
       vmovups  xmmword ptr [rcx], xmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0016
       ret      
 
RWD00  	dq	0000000300000001h, 0000001B00000009h

; Total bytes of code 23

; Assembly listing for method Tests:Alt128(int,int):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovd    xmm0, edx
       vpinsrd  xmm0, xmm0, r8d, 1
       vpbroadcastq xmm0, xmm0
       vmovups  xmmword ptr [rcx], xmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0016
       ret      
 
; Total bytes of code 23

; Assembly listing for method Tests:Sign128():System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovddup xmm0, qword ptr [reloc @RWD00]
       vmovups  xmmword ptr [rcx], xmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x000F
       ret      
 
RWD00  	dq	FFFFFFFF00000001h

; Total bytes of code 16

; Assembly listing for method Tests:Harmonic128(float,float):System.Runtime.Intrinsics.Vector128`1[float] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vbroadcastss xmm0, xmm2
       vmulps   xmm0, xmm0, xmmword ptr [reloc @RWD00]
       vbroadcastss xmm1, xmm1
       vaddps   xmm0, xmm1, xmm0
       vbroadcastss xmm1, dword ptr [reloc @RWD16]
       vdivps   xmm0, xmm1, xmm0
       vmovups  xmmword ptr [rcx], xmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x002A
       ret      
 
RWD00  	dq	3F80000000000000h, 4040000040000000h
RWD16  	dd	3F800000h		;         1

; Total bytes of code 43

; Assembly listing for method Tests:Cauchy128(float,float):System.Runtime.Intrinsics.Vector128`1[float] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vbroadcastss xmm0, xmm2
       vmulps   xmm0, xmm0, xmmword ptr [reloc @RWD00]
       vbroadcastss xmm1, xmm1
       vaddps   xmm0, xmm1, xmm0
       vsqrtps  xmm0, xmm0
       vmovups  xmmword ptr [rcx], xmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0021
       ret      
 
RWD00  	dq	3F80000000000000h, 4040000040000000h

; Total bytes of code 34

; Assembly listing for method Tests:ConcatLowerLower128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  xmm0, xmmword ptr [rdx]
       vshufps  xmm0, xmm0, xmmword ptr [r8], 68
       vmovups  xmmword ptr [rcx], xmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0011
       ret      
 
; Total bytes of code 18

; Assembly listing for method Tests:ConcatLowerUpper128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  xmm0, xmmword ptr [rdx]
       vshufps  xmm0, xmm0, xmmword ptr [r8], -28
       vmovups  xmmword ptr [rcx], xmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0011
       ret      
 
; Total bytes of code 18

; Assembly listing for method Tests:ConcatUpperLower128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  xmm0, xmmword ptr [rdx]
       vshufps  xmm0, xmm0, xmmword ptr [r8], 78
       vmovups  xmmword ptr [rcx], xmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0011
       ret      
 
; Total bytes of code 18

; Assembly listing for method Tests:ConcatUpperUpper128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  xmm0, xmmword ptr [rdx]
       vshufps  xmm0, xmm0, xmmword ptr [r8], -18
       vmovups  xmmword ptr [rcx], xmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0011
       ret      
 
; Total bytes of code 18

; Assembly listing for method Tests:ZipLower128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  xmm0, xmmword ptr [rdx]
       vpunpckldq xmm0, xmm0, xmmword ptr [r8]
       vmovups  xmmword ptr [rcx], xmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0010
       ret      
 
; Total bytes of code 17

; Assembly listing for method Tests:ZipUpper128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  xmm0, xmmword ptr [rdx]
       vpunpckhdq xmm0, xmm0, xmmword ptr [r8]
       vmovups  xmmword ptr [rcx], xmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0010
       ret      
 
; Total bytes of code 17

; Assembly listing for method Tests:Zip128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.ValueTuple`2[System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  xmm0, xmmword ptr [rdx]
       vmovups  xmm1, xmmword ptr [r8]
       vpunpckldq xmm2, xmm0, xmm1
       vpunpckhdq xmm0, xmm0, xmm1
       vmovups  xmmword ptr [rcx], xmm2
       vmovups  xmmword ptr [rcx+0x10], xmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x001D
       ret      
 
; Total bytes of code 30

; Assembly listing for method Tests:UnzipEven128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  xmm0, xmmword ptr [rdx]
       vshufps  xmm0, xmm0, xmmword ptr [r8], -120
       vmovups  xmmword ptr [rcx], xmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0011
       ret      
 
; Total bytes of code 18

; Assembly listing for method Tests:UnzipOdd128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  xmm0, xmmword ptr [rdx]
       vshufps  xmm0, xmm0, xmmword ptr [r8], -35
       vmovups  xmmword ptr [rcx], xmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0011
       ret      
 
; Total bytes of code 18

; Assembly listing for method Tests:Unzip128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.ValueTuple`2[System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  xmm0, xmmword ptr [rdx]
       vmovups  xmm1, xmmword ptr [r8]
       vshufps  xmm2, xmm0, xmm1, -120
       vshufps  xmm0, xmm0, xmm1, -35
       vmovups  xmmword ptr [rcx], xmm2
       vmovups  xmmword ptr [rcx+0x10], xmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x001F
       ret      
 
; Total bytes of code 32

; Assembly listing for method Tests:Reverse128(System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vpshufd  xmm0, xmmword ptr [rdx], 27
       vmovups  xmmword ptr [rcx], xmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x000C
       ret      
 
; Total bytes of code 13
Vector256
; Assembly listing for method Tests:Geo256(int):System.Runtime.Intrinsics.Vector256`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vpbroadcastd ymm0, edx
       vpmulld  ymm0, ymm0, ymmword ptr [reloc @RWD00]
       vmovups  ymmword ptr [rcx], ymm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0016
       vzeroupper 
       ret      
 
RWD00  	dq	0000000300000001h, 0000001B00000009h, 000000F300000051h, 0000088B000002D9h

; Total bytes of code 26

; Assembly listing for method Tests:Alt256(int,int):System.Runtime.Intrinsics.Vector256`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovd    xmm0, edx
       vpinsrd  xmm0, xmm0, r8d, 1
       vpbroadcastq ymm0, ymm0
       vmovups  ymmword ptr [rcx], ymm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0016
       vzeroupper 
       ret      
 
; Total bytes of code 26

; Assembly listing for method Tests:Sign256():System.Runtime.Intrinsics.Vector256`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vbroadcastsd ymm0, qword ptr [reloc @RWD00]
       vmovups  ymmword ptr [rcx], ymm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0010
       vzeroupper 
       ret      
 
RWD00  	dq	FFFFFFFF00000001h

; Total bytes of code 20

; Assembly listing for method Tests:Harmonic256(float,float):System.Runtime.Intrinsics.Vector256`1[float] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vbroadcastss ymm0, ymm2
       vmulps   ymm0, ymm0, ymmword ptr [reloc @RWD00]
       vbroadcastss ymm1, ymm1
       vaddps   ymm0, ymm1, ymm0
       vbroadcastss ymm1, dword ptr [reloc @RWD32]
       vdivps   ymm0, ymm1, ymm0
       vmovups  ymmword ptr [rcx], ymm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x002A
       vzeroupper 
       ret      
 
RWD00  	dq	3F80000000000000h, 4040000040000000h, 40A0000040800000h, 40E0000040C00000h
RWD32  	dd	3F800000h		;         1

; Total bytes of code 46

; Assembly listing for method Tests:Cauchy256(float,float):System.Runtime.Intrinsics.Vector256`1[float] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vbroadcastss ymm0, ymm2
       vmulps   ymm0, ymm0, ymmword ptr [reloc @RWD00]
       vbroadcastss ymm1, ymm1
       vaddps   ymm0, ymm1, ymm0
       vsqrtps  ymm0, ymm0
       vmovups  ymmword ptr [rcx], ymm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0021
       vzeroupper 
       ret      
 
RWD00  	dq	3F80000000000000h, 4040000040000000h, 40A0000040800000h, 40E0000040C00000h

; Total bytes of code 37

; Assembly listing for method Tests:ConcatLowerLower256(System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]):System.Runtime.Intrinsics.Vector256`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovdqu  xmm0, xmmword ptr [rdx]
       vmovdqu  xmm1, xmmword ptr [r8]
       vinserti128 ymm0, ymm0, xmm1
       vmovups  ymmword ptr [rcx], ymm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0016
       vzeroupper 
       ret      
 
; Total bytes of code 26

; Assembly listing for method Tests:ConcatLowerUpper256(System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]):System.Runtime.Intrinsics.Vector256`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovdqu  xmm0, xmmword ptr [rdx]
       vmovups  ymm1, ymmword ptr [r8]
       vextracti128 xmm1, ymm1
       vinserti128 ymm0, ymm0, xmm1
       vmovups  ymmword ptr [rcx], ymm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x001C
       vzeroupper 
       ret      
 
; Total bytes of code 32

; Assembly listing for method Tests:ConcatUpperLower256(System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]):System.Runtime.Intrinsics.Vector256`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  ymm0, ymmword ptr [rdx]
       vextracti128 xmm0, ymm0
       vmovdqu  xmm1, xmmword ptr [r8]
       vinserti128 ymm0, ymm0, xmm1
       vmovups  ymmword ptr [rcx], ymm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x001C
       vzeroupper 
       ret      
 
; Total bytes of code 32

; Assembly listing for method Tests:ConcatUpperUpper256(System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]):System.Runtime.Intrinsics.Vector256`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  ymm0, ymmword ptr [rdx]
       vextracti128 xmm0, ymm0
       vmovups  ymm1, ymmword ptr [r8]
       vextracti128 xmm1, ymm1
       vinserti128 ymm0, ymm0, xmm1
       vmovups  ymmword ptr [rcx], ymm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0022
       vzeroupper 
       ret      
 
; Total bytes of code 38

; Assembly listing for method Tests:ZipLower256(System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]):System.Runtime.Intrinsics.Vector256`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovdqu  xmm0, xmmword ptr [rdx]
       vmovdqu  xmm1, xmmword ptr [r8]
       vpunpckldq xmm2, xmm0, xmm1
       vpunpckhdq xmm0, xmm0, xmm1
       vinserti128 ymm0, ymm2, xmm0
       vmovups  ymmword ptr [rcx], ymm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x001E
       vzeroupper 
       ret      
 
; Total bytes of code 34

; Assembly listing for method Tests:ZipUpper256(System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]):System.Runtime.Intrinsics.Vector256`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  ymm0, ymmword ptr [rdx]
       vextracti128 xmm0, ymm0
       vmovups  ymm1, ymmword ptr [r8]
       vextracti128 xmm1, ymm1
       vpunpckldq xmm2, xmm0, xmm1
       vpunpckhdq xmm0, xmm0, xmm1
       vinserti128 ymm0, ymm2, xmm0
       vmovups  ymmword ptr [rcx], ymm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x002A
       vzeroupper 
       ret      
 
; Total bytes of code 46

; Assembly listing for method Tests:Zip256(System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]):System.ValueTuple`2[System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  ymm0, ymmword ptr [rdx]
       vmovups  ymm1, ymmword ptr [r8]
       vpunpckldq ymm2, ymm0, ymm1
       vpunpckhdq ymm0, ymm0, ymm1
       vperm2i128 ymm1, ymm2, ymm0, 32
       vperm2i128 ymm0, ymm2, ymm0, 49
       vmovups  ymmword ptr [rcx], ymm1
       vmovups  ymmword ptr [rcx+0x20], ymm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0029
       vzeroupper 
       ret      
 
; Total bytes of code 45

; Assembly listing for method Tests:UnzipEven256(System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]):System.Runtime.Intrinsics.Vector256`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vbroadcastf128 ymm0, xmmword ptr [reloc @RWD00]
       vpermd   ymm1, ymm0, ymmword ptr [rdx]
       vpermd   ymm0, ymm0, ymmword ptr [r8]
       vinserti128 ymm0, ymm1, xmm0
       vmovups  ymmword ptr [rcx], ymm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0020
       vzeroupper 
       ret      
 
RWD00  	dq	0000000200000000h, 0000000600000004h

; Total bytes of code 36

; Assembly listing for method Tests:UnzipOdd256(System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]):System.Runtime.Intrinsics.Vector256`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vbroadcastf128 ymm0, xmmword ptr [reloc @RWD00]
       vpermd   ymm1, ymm0, ymmword ptr [rdx]
       vpermd   ymm0, ymm0, ymmword ptr [r8]
       vinserti128 ymm0, ymm1, xmm0
       vmovups  ymmword ptr [rcx], ymm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0020
       vzeroupper 
       ret      
 
RWD00  	dq	0000000300000001h, 0000000700000005h

; Total bytes of code 36

; Assembly listing for method Tests:Unzip256(System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]):System.ValueTuple`2[System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vpshufd  ymm0, ymmword ptr [rdx], -40
       vpermq   ymm0, ymm0, -40
       vpshufd  ymm1, ymmword ptr [r8], -40
       vpermq   ymm1, ymm1, -40
       vperm2i128 ymm2, ymm0, ymm1, 32
       vperm2i128 ymm0, ymm0, ymm1, 49
       vmovups  ymmword ptr [rcx], ymm2
       vmovups  ymmword ptr [rcx+0x20], ymm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x002F
       vzeroupper 
       ret      
 
; Total bytes of code 51

; Assembly listing for method Tests:Reverse256(System.Runtime.Intrinsics.Vector256`1[int]):System.Runtime.Intrinsics.Vector256`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vpshufd  ymm0, ymmword ptr [rdx], 27
       vperm2i128 ymm0, ymm0, ymm0, 1
       vmovups  ymmword ptr [rcx], ymm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0012
       vzeroupper 
       ret      
 
; Total bytes of code 22
Vector512
; Assembly listing for method Tests:Geo512(int):System.Runtime.Intrinsics.Vector512`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vpbroadcastd zmm0, edx
       vpmulld  zmm0, zmm0, zmmword ptr [reloc @RWD00]
       vmovups  zmmword ptr [rcx], zmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0019
       vzeroupper 
       ret      
 
RWD00  	dq	0000000300000001h, 0000001B00000009h, 000000F300000051h, 0000088B000002D9h, 00004CE3000019A1h, 0002B3FB0000E6A9h, 001853D300081BF1h, 00DAF26B0048FB79h

; Total bytes of code 29

; Assembly listing for method Tests:Alt512(int,int):System.Runtime.Intrinsics.Vector512`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovd    xmm0, edx
       vpinsrd  xmm0, xmm0, r8d, 1
       vbroadcasti32x2 zmm0, zmm0
       vmovups  zmmword ptr [rcx], zmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0019
       vzeroupper 
       ret      
 
; Total bytes of code 29

; Assembly listing for method Tests:Sign512():System.Runtime.Intrinsics.Vector512`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vbroadcastsd zmm0, qword ptr [reloc @RWD00]
       vmovups  zmmword ptr [rcx], zmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0013
       vzeroupper 
       ret      
 
RWD00  	dq	FFFFFFFF00000001h

; Total bytes of code 23

; Assembly listing for method Tests:Harmonic512(float,float):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vbroadcastss zmm0, zmm2
       vmulps   zmm0, zmm0, zmmword ptr [reloc @RWD00]
       vbroadcastss zmm1, zmm1
       vaddps   zmm0, zmm1, zmm0
       vbroadcastss zmm1, dword ptr [reloc @RWD64]
       vdivps   zmm0, zmm1, zmm0
       vmovups  zmmword ptr [rcx], zmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0035
       vzeroupper 
       ret      
 
RWD00  	dq	3F80000000000000h, 4040000040000000h, 40A0000040800000h, 40E0000040C00000h, 4110000041000000h, 4130000041200000h, 4150000041400000h, 4170000041600000h
RWD64  	dd	3F800000h		;         1

; Total bytes of code 57

; Assembly listing for method Tests:Cauchy512(float,float):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vbroadcastss zmm0, zmm2
       vmulps   zmm0, zmm0, zmmword ptr [reloc @RWD00]
       vbroadcastss zmm1, zmm1
       vaddps   zmm0, zmm1, zmm0
       vsqrtps  zmm0, zmm0
       vmovups  zmmword ptr [rcx], zmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x002B
       vzeroupper 
       ret      
 
RWD00  	dq	3F80000000000000h, 4040000040000000h, 40A0000040800000h, 40E0000040C00000h, 4110000041000000h, 4130000041200000h, 4150000041400000h, 4170000041600000h

; Total bytes of code 47

; Assembly listing for method Tests:ConcatLowerLower512(System.Runtime.Intrinsics.Vector512`1[int],System.Runtime.Intrinsics.Vector512`1[int]):System.Runtime.Intrinsics.Vector512`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovdqu  ymm0, ymmword ptr [rdx]
       vmovdqu  ymm1, ymmword ptr [r8]
       vinserti32x8 zmm0, zmm0, ymm1, 1
       vmovups  zmmword ptr [rcx], zmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0019
       vzeroupper 
       ret      
 
; Total bytes of code 29

; Assembly listing for method Tests:ConcatLowerUpper512(System.Runtime.Intrinsics.Vector512`1[int],System.Runtime.Intrinsics.Vector512`1[int]):System.Runtime.Intrinsics.Vector512`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovdqu  ymm0, ymmword ptr [rdx]
       vmovups  zmm1, zmmword ptr [r8]
       vextracti32x8 ymm1, zmm1, 1
       vinserti32x8 zmm0, zmm0, ymm1, 1
       vmovups  zmmword ptr [rcx], zmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0021
       vzeroupper 
       ret      
 
; Total bytes of code 37

; Assembly listing for method Tests:ConcatUpperLower512(System.Runtime.Intrinsics.Vector512`1[int],System.Runtime.Intrinsics.Vector512`1[int]):System.Runtime.Intrinsics.Vector512`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  zmm0, zmmword ptr [rdx]
       vextracti32x8 ymm0, zmm0, 1
       vmovdqu  ymm1, ymmword ptr [r8]
       vinserti32x8 zmm0, zmm0, ymm1, 1
       vmovups  zmmword ptr [rcx], zmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0022
       vzeroupper 
       ret      
 
; Total bytes of code 38

; Assembly listing for method Tests:ConcatUpperUpper512(System.Runtime.Intrinsics.Vector512`1[int],System.Runtime.Intrinsics.Vector512`1[int]):System.Runtime.Intrinsics.Vector512`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  zmm0, zmmword ptr [rdx]
       vextracti32x8 ymm0, zmm0, 1
       vmovups  zmm1, zmmword ptr [r8]
       vextracti32x8 ymm1, zmm1, 1
       vinserti32x8 zmm0, zmm0, ymm1, 1
       vmovups  zmmword ptr [rcx], zmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x002A
       vzeroupper 
       ret      
 
; Total bytes of code 46

; Assembly listing for method Tests:ZipLower512(System.Runtime.Intrinsics.Vector512`1[int],System.Runtime.Intrinsics.Vector512`1[int]):System.Runtime.Intrinsics.Vector512`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  zmm0, zmmword ptr [rdx]
       vmovups  zmm1, zmmword ptr [r8]
       vpunpckldq zmm2, zmm0, zmm1
       vpunpckhdq zmm0, zmm0, zmm1
       vshufi32x4 zmm0, zmm2, zmm0, 68
       vshufi32x4 zmm0, zmm0, zmm0, -40
       vmovups  zmmword ptr [rcx], zmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x002F
       vzeroupper 
       ret      
 
; Total bytes of code 51

; Assembly listing for method Tests:ZipUpper512(System.Runtime.Intrinsics.Vector512`1[int],System.Runtime.Intrinsics.Vector512`1[int]):System.Runtime.Intrinsics.Vector512`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  zmm0, zmmword ptr [rdx]
       vmovups  zmm1, zmmword ptr [r8]
       vpunpckldq zmm2, zmm0, zmm1
       vpunpckhdq zmm0, zmm0, zmm1
       vshufi32x4 zmm0, zmm2, zmm0, -18
       vshufi32x4 zmm0, zmm0, zmm0, -40
       vmovups  zmmword ptr [rcx], zmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x002F
       vzeroupper 
       ret      
 
; Total bytes of code 51

; Assembly listing for method Tests:Zip512(System.Runtime.Intrinsics.Vector512`1[int],System.Runtime.Intrinsics.Vector512`1[int]):System.ValueTuple`2[System.Runtime.Intrinsics.Vector512`1[int],System.Runtime.Intrinsics.Vector512`1[int]] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  zmm0, zmmword ptr [rdx]
       vmovups  zmm1, zmmword ptr [r8]
       vpunpckhdq zmm2, zmm0, zmm1
       vpunpckldq zmm0, zmm0, zmm1
       vshufi32x4 zmm1, zmm0, zmm2, 68
       vshufi32x4 zmm1, zmm1, zmm1, -40
       vshufi32x4 zmm0, zmm0, zmm2, -18
       vshufi32x4 zmm0, zmm0, zmm0, -40
       vmovups  zmmword ptr [rcx], zmm1
       vmovups  zmmword ptr [rcx+0x40], zmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0044
       vzeroupper 
       ret      
 
; Total bytes of code 72

; Assembly listing for method Tests:UnzipEven512(System.Runtime.Intrinsics.Vector512`1[int],System.Runtime.Intrinsics.Vector512`1[int]):System.Runtime.Intrinsics.Vector512`1[int] (FullOpts)

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  zmm0, zmmword ptr [rdx]
       vmovups  zmm1, zmmword ptr [reloc @RWD00]
       vpermt2d zmm0, zmm1, zmmword ptr [r8]
       vmovups  zmmword ptr [rcx], zmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x001F
       vzeroupper 
       ret      
 
RWD00  	dq	0000000200000000h, 0000000600000004h, 0000000A00000008h, 0000000E0000000Ch, 0000001200000010h, 0000001600000014h, 0000001A00000018h, 0000001E0000001Ch

; Total bytes of code 35

; Assembly listing for method Tests:UnzipOdd512(System.Runtime.Intrinsics.Vector512`1[int],System.Runtime.Intrinsics.Vector512`1[int]):System.Runtime.Intrinsics.Vector512`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  zmm0, zmmword ptr [rdx]
       vmovups  zmm1, zmmword ptr [reloc @RWD00]
       vpermt2d zmm0, zmm1, zmmword ptr [r8]
       vmovups  zmmword ptr [rcx], zmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x001F
       vzeroupper 
       ret      
 
RWD00  	dq	0000000300000001h, 0000000700000005h, 0000000B00000009h, 0000000F0000000Dh, 0000001300000011h, 0000001700000015h, 0000001B00000019h, 0000001F0000001Dh

; Total bytes of code 35

; Assembly listing for method Tests:Unzip512(System.Runtime.Intrinsics.Vector512`1[int],System.Runtime.Intrinsics.Vector512`1[int]):System.ValueTuple`2[System.Runtime.Intrinsics.Vector512`1[int],System.Runtime.Intrinsics.Vector512`1[int]] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  zmm0, zmmword ptr [rdx]
       vmovups  zmm1, zmmword ptr [reloc @RWD00]
       vmovups  zmm2, zmmword ptr [r8]
       vmovaps  zmm3, zmm2
       vpermt2d zmm3, zmm1, zmm0
       vmovups  zmm1, zmmword ptr [reloc @RWD64]
       vpermt2d zmm2, zmm1, zmm0
       vmovups  zmmword ptr [rcx], zmm2
       vmovups  zmmword ptr [rcx+0x40], zmm3
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0042
       vzeroupper 
       ret      
 
RWD00  	dq	0000001300000011h, 0000001700000015h, 0000001B00000019h, 0000001F0000001Dh, 0000000300000001h, 0000000700000005h, 0000000B00000009h, 0000000F0000000Dh
RWD64  	dq	0000001200000010h, 0000001600000014h, 0000001A00000018h, 0000001E0000001Ch, 0000000200000000h, 0000000600000004h, 0000000A00000008h, 0000000E0000000Ch

; Total bytes of code 70

; Assembly listing for method Tests:Reverse512(System.Runtime.Intrinsics.Vector512`1[int]):System.Runtime.Intrinsics.Vector512`1[int] (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX + EVEX on Windows

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovups  zmm0, zmmword ptr [reloc @RWD00]
       vpermd   zmm0, zmm0, zmmword ptr [rdx]
       vmovups  zmmword ptr [rcx], zmm0
       mov      rax, rcx
 
G_M000_IG03:                ;; offset=0x0019
       vzeroupper 
       ret      
 
RWD00  	dq	0000000E0000000Fh, 0000000C0000000Dh, 0000000A0000000Bh, 0000000800000009h, 0000000600000007h, 0000000400000005h, 0000000200000003h, 0000000000000001h

; Total bytes of code 29
ARM64 (Vector64 + Vector128)
; Assembly listing for method Tests:Geo64(int):System.Runtime.Intrinsics.Vector64`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            dup     v0.2s, w0
            ldr     d16, [@RWD00]
            mul     v0.2s, v0.2s, v16.2s
 
G_M000_IG03:                ;; offset=0x0014
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
RWD00  	dq	0000000300000001h

; Total bytes of code 28

; Assembly listing for method Tests:Alt64(int,int):System.Runtime.Intrinsics.Vector64`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            dup     v0.2s, w0
            dup     v16.2s, w1
            zip1    v0.2s, v0.2s, v16.2s
 
G_M000_IG03:                ;; offset=0x0014
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 28

; Assembly listing for method Tests:Sign64():System.Runtime.Intrinsics.Vector64`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            ldr     d0, [@RWD00]
 
G_M000_IG03:                ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
RWD00  	dq	FFFFFFFF00000001h

; Total bytes of code 20

; Assembly listing for method Tests:Harmonic64(float,float):System.Runtime.Intrinsics.Vector64`1[float] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            ldr     d16, [@RWD00]
            fmul    v16.2s, v16.2s, v1.s[0]
            dup     v0.2s, v0.s[0]
            fadd    v0.2s, v16.2s, v0.2s
            ldr     d16, [@RWD08]
            fdiv    v0.2s, v16.2s, v0.2s
 
G_M000_IG03:                ;; offset=0x0020
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
RWD00  	dq	3F80000000000000h
RWD08  	dq	3F8000003F800000h

; Total bytes of code 40

; Assembly listing for method Tests:Cauchy64(float,float):System.Runtime.Intrinsics.Vector64`1[float] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows
; FullOpts code

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            ldr     d16, [@RWD00]
            fmul    v16.2s, v16.2s, v1.s[0]
            dup     v0.2s, v0.s[0]
            fadd    v0.2s, v16.2s, v0.2s
            fsqrt   v0.2s, v0.2s
 
G_M000_IG03:                ;; offset=0x001C
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
RWD00  	dq	3F80000000000000h

; Total bytes of code 36

; Assembly listing for method Tests:ConcatLowerLower64(System.Runtime.Intrinsics.Vector64`1[int],System.Runtime.Intrinsics.Vector64`1[int]):System.Runtime.Intrinsics.Vector64`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            ins     v0.s[1], v1.s[0]
 
G_M000_IG03:                ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 20

; Assembly listing for method Tests:ConcatLowerUpper64(System.Runtime.Intrinsics.Vector64`1[int],System.Runtime.Intrinsics.Vector64`1[int]):System.Runtime.Intrinsics.Vector64`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            ins     v0.s[1], v1.s[1]
 
G_M000_IG03:                ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 20

; Assembly listing for method Tests:ConcatUpperLower64(System.Runtime.Intrinsics.Vector64`1[int],System.Runtime.Intrinsics.Vector64`1[int]):System.Runtime.Intrinsics.Vector64`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            ins     v0.s[0], v0.s[1]
            ins     v0.s[1], v1.s[0]
 
G_M000_IG03:                ;; offset=0x0010
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 24

; Assembly listing for method Tests:ConcatUpperUpper64(System.Runtime.Intrinsics.Vector64`1[int],System.Runtime.Intrinsics.Vector64`1[int]):System.Runtime.Intrinsics.Vector64`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            ins     v0.s[0], v0.s[1]
            ins     v0.s[1], v1.s[1]
 
G_M000_IG03:                ;; offset=0x0010
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 24

; Assembly listing for method Tests:ZipLower64(System.Runtime.Intrinsics.Vector64`1[int],System.Runtime.Intrinsics.Vector64`1[int]):System.Runtime.Intrinsics.Vector64`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            zip1    v0.2s, v0.2s, v1.2s
 
G_M000_IG03:                ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 20

; Assembly listing for method Tests:ZipUpper64(System.Runtime.Intrinsics.Vector64`1[int],System.Runtime.Intrinsics.Vector64`1[int]):System.Runtime.Intrinsics.Vector64`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            zip2    v0.2s, v0.2s, v1.2s
 
G_M000_IG03:                ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 20

; Assembly listing for method Tests:Zip64(System.Runtime.Intrinsics.Vector64`1[int],System.Runtime.Intrinsics.Vector64`1[int]):System.ValueTuple`2[System.Runtime.Intrinsics.Vector64`1[int],System.Runtime.Intrinsics.Vector64`1[int]] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            zip1    v16.2s, v0.2s, v1.2s
            zip2    v1.2s, v0.2s, v1.2s
            mov     v0.8b, v16.8b
 
G_M000_IG03:                ;; offset=0x0014
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 28

; Assembly listing for method Tests:UnzipEven64(System.Runtime.Intrinsics.Vector64`1[int],System.Runtime.Intrinsics.Vector64`1[int]):System.Runtime.Intrinsics.Vector64`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            uzp1    v0.2s, v0.2s, v1.2s
 
G_M000_IG03:                ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 20

; Assembly listing for method Tests:UnzipOdd64(System.Runtime.Intrinsics.Vector64`1[int],System.Runtime.Intrinsics.Vector64`1[int]):System.Runtime.Intrinsics.Vector64`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            uzp2    v0.2s, v0.2s, v1.2s
 
G_M000_IG03:                ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 20

; Assembly listing for method Tests:Unzip64(System.Runtime.Intrinsics.Vector64`1[int],System.Runtime.Intrinsics.Vector64`1[int]):System.ValueTuple`2[System.Runtime.Intrinsics.Vector64`1[int],System.Runtime.Intrinsics.Vector64`1[int]] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            uzp1    v16.2s, v0.2s, v1.2s
            uzp2    v1.2s, v0.2s, v1.2s
            mov     v0.8b, v16.8b
 
G_M000_IG03:                ;; offset=0x0014
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 28

; Assembly listing for method Tests:Reverse64(System.Runtime.Intrinsics.Vector64`1[int]):System.Runtime.Intrinsics.Vector64`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            rev64   v0.2s, v0.2s
 
G_M000_IG03:                ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 20

; Assembly listing for method Tests:Geo128(int):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            dup     v0.4s, w0
            ldr     q16, [@RWD00]
            mul     v0.4s, v0.4s, v16.4s
 
G_M000_IG03:                ;; offset=0x0014
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
RWD00  	dq	0000000300000001h, 0000001B00000009h

; Total bytes of code 28

; Assembly listing for method Tests:Alt128(int,int):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            dup     v0.4s, w0
            dup     v16.4s, w1
            zip1    v0.4s, v0.4s, v16.4s
 
G_M000_IG03:                ;; offset=0x0014
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 28

; Assembly listing for method Tests:Sign128():System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            ldr     q0, [@RWD00]
 
G_M000_IG03:                ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
RWD00  	dq	FFFFFFFF00000001h, FFFFFFFF00000001h

; Total bytes of code 20

; Assembly listing for method Tests:Harmonic128(float,float):System.Runtime.Intrinsics.Vector128`1[float] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            ldr     q16, [@RWD00]
            fmul    v16.4s, v16.4s, v1.s[0]
            dup     v0.4s, v0.s[0]
            fadd    v0.4s, v16.4s, v0.4s
            ldr     q16, [@RWD16]
            fdiv    v0.4s, v16.4s, v0.4s
 
G_M000_IG03:                ;; offset=0x0020
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
RWD00  	dq	3F80000000000000h, 4040000040000000h
RWD16  	dq	3F8000003F800000h, 3F8000003F800000h

; Total bytes of code 40

; Assembly listing for method Tests:Cauchy128(float,float):System.Runtime.Intrinsics.Vector128`1[float] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            ldr     q16, [@RWD00]
            fmul    v16.4s, v16.4s, v1.s[0]
            dup     v0.4s, v0.s[0]
            fadd    v0.4s, v16.4s, v0.4s
            fsqrt   v0.4s, v0.4s
 
G_M000_IG03:                ;; offset=0x001C
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
RWD00  	dq	3F80000000000000h, 4040000040000000h

; Total bytes of code 36

; Assembly listing for method Tests:ConcatLowerLower128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            ins     v0.d[1], v1.d[0]
 
G_M000_IG03:                ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 20

; Assembly listing for method Tests:ConcatLowerUpper128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            ins     v0.d[1], v1.d[1]
 
G_M000_IG03:                ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 20

; Assembly listing for method Tests:ConcatUpperLower128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            ins     v0.d[0], v0.d[1]
            ins     v0.d[1], v1.d[0]
 
G_M000_IG03:                ;; offset=0x0010
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 24

; Assembly listing for method Tests:ConcatUpperUpper128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            ins     v0.d[0], v0.d[1]
            ins     v0.d[1], v1.d[1]
 
G_M000_IG03:                ;; offset=0x0010
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 24

; Assembly listing for method Tests:ZipLower128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            zip1    v0.4s, v0.4s, v1.4s
 
G_M000_IG03:                ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 20

; Assembly listing for method Tests:ZipUpper128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            zip2    v0.4s, v0.4s, v1.4s
 
G_M000_IG03:                ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 20

; Assembly listing for method Tests:Zip128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.ValueTuple`2[System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            zip1    v16.4s, v0.4s, v1.4s
            zip2    v1.4s, v0.4s, v1.4s
            mov     v0.16b, v16.16b
 
G_M000_IG03:                ;; offset=0x0014
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 28

; Assembly listing for method Tests:UnzipEven128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            uzp1    v0.4s, v0.4s, v1.4s
 
G_M000_IG03:                ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 20

; Assembly listing for method Tests:UnzipOdd128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            uzp2    v0.4s, v0.4s, v1.4s
 
G_M000_IG03:                ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 20

; Assembly listing for method Tests:Unzip128(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):System.ValueTuple`2[System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            uzp1    v16.4s, v0.4s, v1.4s
            uzp2    v1.4s, v0.4s, v1.4s
            mov     v0.16b, v16.16b
 
G_M000_IG03:                ;; offset=0x0014
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
; Total bytes of code 28

; Assembly listing for method Tests:Reverse128(System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[int] (FullOpts)
; Emitting BLENDED_CODE for arm64 on Windows

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
 
G_M000_IG02:                ;; offset=0x0008
            ldr     q16, [@RWD00]
            tbl     v0.16b, {v0.16b}, v16.16b
 
G_M000_IG03:                ;; offset=0x0010
            ldp     fp, lr, [sp], #0x10
            ret     lr
 
RWD00  	dq	0B0A09080F0E0D0Ch, 0302010007060504h

; Total bytes of code 24

Codegen for constant input:

Vector256:

; Vector256.CreateGeometricSequence<int>(2, 3)
vmovups  ymm0, ymmword ptr [reloc @RWD00]
vmovups  ymmword ptr [rcx], ymm0
mov      rax, rcx
vzeroupper
ret
RWD00 dq 0000000600000002h, 0000003600000012h, 000001E6000000A2h, 00001116000005B2h

; Vector256.CreateAlternatingSequence<int>(5, 7)
vbroadcastsd ymm0, qword ptr [reloc @RWD00]
vmovups  ymmword ptr [rcx], ymm0
mov      rax, rcx
vzeroupper
ret
RWD00 dq 0000000700000005h

; Vector256.CreateHarmonicSequence<float>(1.0f, 2.0f)
vmovups  ymm0, ymmword ptr [reloc @RWD00]
vmovups  ymmword ptr [rcx], ymm0
mov      rax, rcx
vzeroupper
ret
RWD00 dq 3EAAAAAB3F800000h, 3E1249253E4CCCCDh, 3DBA2E8C3DE38E39h, 3D8888893D9D89D9h

; Vector256.CreateCauchySequence<float>(1.0f, 2.0f)
vsqrtps  ymm0, ymmword ptr [reloc @RWD00]
vmovups  ymmword ptr [rcx], ymm0
mov      rax, rcx
vzeroupper
ret
RWD00 dq 404000003F800000h, 40E0000040A00000h, 4130000041100000h, 4170000041500000h

Vector512:

; Vector512.CreateGeometricSequence<int>(2, 3)
vmovups  zmm0, zmmword ptr [reloc @RWD00]
vmovups  zmmword ptr [rcx], zmm0
mov      rax, rcx
vzeroupper
ret
RWD00 dq 0000000600000002h, 0000003600000012h, 000001E6000000A2h, 00001116000005B2h, 000099C600003342h, 000567F60001CD52h, 0030A7A6001037E2h, 01B5E4D60091F6F2h

; Vector512.CreateAlternatingSequence<int>(5, 7)
vbroadcastsd zmm0, qword ptr [reloc @RWD00]
vmovups  zmmword ptr [rcx], zmm0
mov      rax, rcx
vzeroupper
ret
RWD00 dq 0000000700000005h

; Vector512.CreateHarmonicSequence<float>(1.0f, 2.0f)
vmovups  zmm0, zmmword ptr [reloc @RWD00]
vmovups  zmmword ptr [rcx], zmm0
mov      rax, rcx
vzeroupper
ret
RWD00 dq 3EAAAAAB3F800000h, 3E1249253E4CCCCDh, 3DBA2E8C3DE38E39h, 3D8888893D9D89D9h, 3D5794363D70F0F1h, 3D3216433D430C31h, 3D17B4263D23D70Ah, 3D0421083D0D3DCBh

; Vector512.CreateCauchySequence<float>(1.0f, 2.0f)
vsqrtps  zmm0, zmmword ptr [reloc @RWD00]
vmovups  zmmword ptr [rcx], zmm0
mov      rax, rcx
vzeroupper
ret
RWD00 dq 404000003F800000h, 40E0000040A00000h, 4130000041100000h, 4170000041500000h, 4198000041880000h, 41B8000041A80000h, 41D8000041C80000h, 41F8000041E80000h

Vector512 (without AVX512 - Vector256 decomposition path):

; Vector512.CreateGeometricSequence<int>(2, 3)
vmovups  ymm0, ymmword ptr [reloc @RWD00]
vmovups  ymmword ptr [rcx], ymm0
vmovups  ymm0, ymmword ptr [reloc @RWD32]
vmovups  ymmword ptr [rcx+0x20], ymm0
mov      rax, rcx
vzeroupper
ret
RWD00 dq 0000000600000002h, 0000003600000012h, 000001E6000000A2h, 00001116000005B2h
RWD32 dq 000099C600003342h, 000567F60001CD52h, 0030A7A6001037E2h, 01B5E4D60091F6F2h

; Vector512.CreateAlternatingSequence<int>(5, 7)
vbroadcastsd ymm0, qword ptr [reloc @RWD00]
vmovups  ymmword ptr [rcx], ymm0
vmovups  ymmword ptr [rcx+0x20], ymm0
mov      rax, rcx
vzeroupper
ret
RWD00 dq 0000000700000005h

; Vector512.CreateHarmonicSequence<float>(1.0f, 2.0f)
vmovups  ymm0, ymmword ptr [reloc @RWD00]
vmovups  ymmword ptr [rcx], ymm0
vmovups  ymm0, ymmword ptr [reloc @RWD32]
vmovups  ymmword ptr [rcx+0x20], ymm0
mov      rax, rcx
vzeroupper
ret

; Vector512.CreateCauchySequence<float>(1.0f, 2.0f)
vsqrtps  ymm0, ymmword ptr [reloc @RWD00]
vsqrtps  ymm1, ymmword ptr [reloc @RWD32]
vmovups  ymmword ptr [rcx], ymm0
vmovups  ymmword ptr [rcx+0x20], ymm1
mov      rax, rcx
vzeroupper
ret

Closes #122557

cc: @tannergooding

Copilot AI review requested due to automatic review settings May 3, 2026 14:32
@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 3, 2026
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label May 3, 2026
@hez2010 hez2010 marked this pull request as draft May 3, 2026 14:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds new vector sequence-generation helpers (geometric/alternating/harmonic/cauchy), sign-sequence helpers, and lane-manipulation operations (zip/unzip/concat/reverse) across Vector<T> and Vector{64,128,256,512}<T>, including JIT recognition and test coverage.

Changes:

  • Introduces new public APIs in the ref assemblies for sequence creation + lane operations and SignSequence.
  • Implements the APIs in CoreLib for Vector<T> and Vector{64,128,256,512}<T>, with some JIT fast-paths.
  • Adds unit tests validating the new behaviors across vector widths.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/libraries/System.Runtime.Intrinsics/tests/Vectors/Vector64Tests.cs Adds tests for new Vector64 sequence + lane APIs
src/libraries/System.Runtime.Intrinsics/tests/Vectors/Vector128Tests.cs Adds tests for new Vector128 sequence + lane APIs
src/libraries/System.Runtime.Intrinsics/tests/Vectors/Vector256Tests.cs Adds tests for new Vector256 sequence + lane APIs
src/libraries/System.Runtime.Intrinsics/tests/Vectors/Vector512Tests.cs Adds tests for new Vector512 sequence + lane APIs
src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs Exposes new Vector{64,128,256,512} APIs in the reference contract
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector64_1.cs Adds Vector64<T>.SignSequence
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector64.cs Implements Vector64 sequence + lane APIs
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128_1.cs Adds Vector128<T>.SignSequence
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128.cs Implements Vector128 sequence + lane APIs
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256_1.cs Adds Vector256<T>.SignSequence
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs Implements Vector256 sequence + lane APIs
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector512_1.cs Adds Vector512<T>.SignSequence
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector512.cs Implements Vector512 sequence + lane APIs + AVX-512 special-cases
src/libraries/System.Private.CoreLib/src/System/Numerics/Vector_1.cs Adds Vector<T>.SignSequence
src/libraries/System.Private.CoreLib/src/System/Numerics/Vector.cs Implements Vector sequence + lane APIs
src/libraries/System.Numerics.Vectors/tests/GenericVectorTests.cs Adds tests for new System.Numerics.Vector APIs
src/libraries/System.Numerics.Vectors/ref/System.Numerics.Vectors.cs Exposes new System.Numerics.Vector APIs in the reference contract
src/coreclr/jit/hwintrinsicxarch.cpp Adds xarch JIT special-import support for new intrinsics
src/coreclr/jit/hwintrinsiclistxarch.h Registers new xarch HW intrinsic IDs
src/coreclr/jit/hwintrinsicarm64.cpp Adds arm64 JIT special-import support for new intrinsics
src/coreclr/jit/hwintrinsiclistarm64.h Registers new arm64 HW intrinsic IDs
src/coreclr/jit/compiler.h Declares new SIMD IR node builders used by importer/lowering

Comment thread src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128_1.cs Outdated
Comment thread src/coreclr/jit/hwintrinsicxarch.cpp Outdated
Copilot AI review requested due to automatic review settings May 3, 2026 14:50
@hez2010 hez2010 marked this pull request as ready for review May 3, 2026 14:51
@teo-tsirpanis teo-tsirpanis added area-System.Runtime.Intrinsics and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels May 3, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 8 comments.

Comments suppressed due to low confidence (4)

src/coreclr/jit/gentree.cpp:1

  • Several simdCount==1 special-cases use gtWrapWithSideEffects in a way that can reverse the required left-to-right argument evaluation order (op1 then op2) for intrinsics like Concat/Zip/Unzip, and for CreateAlternatingSequence. This is observable for Vector64/Vector64 (and similar) where the vector length is 1 and arguments may have side-effects. Consider materializing op1 into a temp before sequencing op2, or otherwise constructing the tree so op1 is evaluated before op2 while still returning op1 (or the correct constant for UnzipOdd).
    src/coreclr/jit/gentree.cpp:1
  • Several simdCount==1 special-cases use gtWrapWithSideEffects in a way that can reverse the required left-to-right argument evaluation order (op1 then op2) for intrinsics like Concat/Zip/Unzip, and for CreateAlternatingSequence. This is observable for Vector64/Vector64 (and similar) where the vector length is 1 and arguments may have side-effects. Consider materializing op1 into a temp before sequencing op2, or otherwise constructing the tree so op1 is evaluated before op2 while still returning op1 (or the correct constant for UnzipOdd).
    src/coreclr/jit/gentree.cpp:1
  • Several simdCount==1 special-cases use gtWrapWithSideEffects in a way that can reverse the required left-to-right argument evaluation order (op1 then op2) for intrinsics like Concat/Zip/Unzip, and for CreateAlternatingSequence. This is observable for Vector64/Vector64 (and similar) where the vector length is 1 and arguments may have side-effects. Consider materializing op1 into a temp before sequencing op2, or otherwise constructing the tree so op1 is evaluated before op2 while still returning op1 (or the correct constant for UnzipOdd).
    src/coreclr/jit/gentree.cpp:1
  • Several simdCount==1 special-cases use gtWrapWithSideEffects in a way that can reverse the required left-to-right argument evaluation order (op1 then op2) for intrinsics like Concat/Zip/Unzip, and for CreateAlternatingSequence. This is observable for Vector64/Vector64 (and similar) where the vector length is 1 and arguments may have side-effects. Consider materializing op1 into a temp before sequencing op2, or otherwise constructing the tree so op1 is evaluated before op2 while still returning op1 (or the correct constant for UnzipOdd).

Comment thread src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector64_1.cs Outdated
Comment thread src/coreclr/jit/hwintrinsicxarch.cpp Outdated
Comment thread src/coreclr/jit/hwintrinsicxarch.cpp Outdated
Comment thread src/coreclr/jit/hwintrinsicxarch.cpp Outdated
Comment thread src/coreclr/jit/hwintrinsicarm64.cpp Outdated
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Copilot AI review requested due to automatic review settings May 3, 2026 15:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.

Comment thread src/libraries/System.Runtime.Intrinsics/tests/Vectors/Vector64Tests.cs Outdated
Comment thread src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs Outdated
Comment thread src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector512.cs Outdated
Comment thread src/libraries/System.Numerics.Vectors/tests/GenericVectorTests.cs
Copilot AI review requested due to automatic review settings May 3, 2026 16:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.

Comment thread src/coreclr/jit/hwintrinsicxarch.cpp
Comment thread src/coreclr/jit/hwintrinsicxarch.cpp
Comment thread src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs Outdated
Copilot AI review requested due to automatic review settings May 3, 2026 16:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/coreclr/jit/gentree.cpp:1

  • This simdCount == 1 handling uses a GT_COMMA + fgMakeMultiUse + gtWrapWithSideEffects combination to preserve argument side effects while returning the op1-derived value. This pattern is non-obvious and repeats in other helpers in this diff (concat/zip/unzip). Consider centralizing a small utility for “return X but also evaluate Y for side effects” (or adding a short comment here explaining why both GT_COMMA and fgMakeMultiUse are required), to make the side-effect preservation strategy easier to audit and less error-prone.

Comment thread src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector512.cs Outdated
Copilot AI review requested due to automatic review settings May 3, 2026 18:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 3 comments.

Comment thread src/libraries/System.Numerics.Vectors/tests/GenericVectorTests.cs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Runtime.Intrinsics community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[API Proposal]: More Vector<T> sequence and lane APIs

3 participants