Add lane construction and composition APIs#127690
Add lane construction and composition APIs#127690hez2010 wants to merge 23 commits intodotnet:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds new vector sequence-generation helpers (geometric/alternating/harmonic/cauchy), sign-sequence helpers, and lane-manipulation operations (zip/unzip/concat/reverse) across Vector<T> and Vector{64,128,256,512}<T>, including JIT recognition and test coverage.
Changes:
- Introduces new public APIs in the ref assemblies for sequence creation + lane operations and
SignSequence. - Implements the APIs in CoreLib for
Vector<T>andVector{64,128,256,512}<T>, with some JIT fast-paths. - Adds unit tests validating the new behaviors across vector widths.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Runtime.Intrinsics/tests/Vectors/Vector64Tests.cs | Adds tests for new Vector64 sequence + lane APIs |
| src/libraries/System.Runtime.Intrinsics/tests/Vectors/Vector128Tests.cs | Adds tests for new Vector128 sequence + lane APIs |
| src/libraries/System.Runtime.Intrinsics/tests/Vectors/Vector256Tests.cs | Adds tests for new Vector256 sequence + lane APIs |
| src/libraries/System.Runtime.Intrinsics/tests/Vectors/Vector512Tests.cs | Adds tests for new Vector512 sequence + lane APIs |
| src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs | Exposes new Vector{64,128,256,512} APIs in the reference contract |
| src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector64_1.cs | Adds Vector64<T>.SignSequence |
| src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector64.cs | Implements Vector64 sequence + lane APIs |
| src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128_1.cs | Adds Vector128<T>.SignSequence |
| src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128.cs | Implements Vector128 sequence + lane APIs |
| src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256_1.cs | Adds Vector256<T>.SignSequence |
| src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs | Implements Vector256 sequence + lane APIs |
| src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector512_1.cs | Adds Vector512<T>.SignSequence |
| src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector512.cs | Implements Vector512 sequence + lane APIs + AVX-512 special-cases |
| src/libraries/System.Private.CoreLib/src/System/Numerics/Vector_1.cs | Adds Vector<T>.SignSequence |
| src/libraries/System.Private.CoreLib/src/System/Numerics/Vector.cs | Implements Vector sequence + lane APIs |
| src/libraries/System.Numerics.Vectors/tests/GenericVectorTests.cs | Adds tests for new System.Numerics.Vector APIs |
| src/libraries/System.Numerics.Vectors/ref/System.Numerics.Vectors.cs | Exposes new System.Numerics.Vector APIs in the reference contract |
| src/coreclr/jit/hwintrinsicxarch.cpp | Adds xarch JIT special-import support for new intrinsics |
| src/coreclr/jit/hwintrinsiclistxarch.h | Registers new xarch HW intrinsic IDs |
| src/coreclr/jit/hwintrinsicarm64.cpp | Adds arm64 JIT special-import support for new intrinsics |
| src/coreclr/jit/hwintrinsiclistarm64.h | Registers new arm64 HW intrinsic IDs |
| src/coreclr/jit/compiler.h | Declares new SIMD IR node builders used by importer/lowering |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 23 changed files in this pull request and generated 8 comments.
Comments suppressed due to low confidence (4)
src/coreclr/jit/gentree.cpp:1
- Several simdCount==1 special-cases use gtWrapWithSideEffects in a way that can reverse the required left-to-right argument evaluation order (op1 then op2) for intrinsics like Concat/Zip/Unzip, and for CreateAlternatingSequence. This is observable for Vector64/Vector64 (and similar) where the vector length is 1 and arguments may have side-effects. Consider materializing op1 into a temp before sequencing op2, or otherwise constructing the tree so op1 is evaluated before op2 while still returning op1 (or the correct constant for UnzipOdd).
src/coreclr/jit/gentree.cpp:1 - Several simdCount==1 special-cases use gtWrapWithSideEffects in a way that can reverse the required left-to-right argument evaluation order (op1 then op2) for intrinsics like Concat/Zip/Unzip, and for CreateAlternatingSequence. This is observable for Vector64/Vector64 (and similar) where the vector length is 1 and arguments may have side-effects. Consider materializing op1 into a temp before sequencing op2, or otherwise constructing the tree so op1 is evaluated before op2 while still returning op1 (or the correct constant for UnzipOdd).
src/coreclr/jit/gentree.cpp:1 - Several simdCount==1 special-cases use gtWrapWithSideEffects in a way that can reverse the required left-to-right argument evaluation order (op1 then op2) for intrinsics like Concat/Zip/Unzip, and for CreateAlternatingSequence. This is observable for Vector64/Vector64 (and similar) where the vector length is 1 and arguments may have side-effects. Consider materializing op1 into a temp before sequencing op2, or otherwise constructing the tree so op1 is evaluated before op2 while still returning op1 (or the correct constant for UnzipOdd).
src/coreclr/jit/gentree.cpp:1 - Several simdCount==1 special-cases use gtWrapWithSideEffects in a way that can reverse the required left-to-right argument evaluation order (op1 then op2) for intrinsics like Concat/Zip/Unzip, and for CreateAlternatingSequence. This is observable for Vector64/Vector64 (and similar) where the vector length is 1 and arguments may have side-effects. Consider materializing op1 into a temp before sequencing op2, or otherwise constructing the tree so op1 is evaluated before op2 while still returning op1 (or the correct constant for UnzipOdd).
|
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 23 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/coreclr/jit/gentree.cpp:1
- This
simdCount == 1handling uses aGT_COMMA+fgMakeMultiUse+gtWrapWithSideEffectscombination to preserve argument side effects while returning theop1-derived value. This pattern is non-obvious and repeats in other helpers in this diff (concat/zip/unzip). Consider centralizing a small utility for “return X but also evaluate Y for side effects” (or adding a short comment here explaining why bothGT_COMMAandfgMakeMultiUseare required), to make the side-effect preservation strategy easier to audit and less error-prone.
This PR adds lane construction and composition APIs approved in #122557, and the corresponding JIT intrinsics.
The JIT now recognizes the new vector APIs and expands them using existing SIMD nodes. The managed implementation allows decomposition through smaller vector widths when wider hardware support is unavailable.
The xarch lowering uses fixed shuffle forms where profitable:
vpbroadcast*for sequence and alternating constructionvshufpsfor 128-bit concat/unzip patternsvperm2i128for 256-bit zip/unzipThe ARM64 lowering avoids table-lookup forms for small fixed concat/reverse operations and uses direct element moves where applicable, such as
insandrev64.CreateCauchySequencerequires constant foldingsqrtin the JIT to produce optimal code, but I would like to leave it for now as it's out-of-scope for this PR.Codegen:
Vector128
Vector256
Vector512
ARM64 (Vector64 + Vector128)
Codegen for constant input:
Vector256:
Vector512:
Vector512 (without AVX512 - Vector256 decomposition path):
Closes #122557
cc: @tannergooding