Skip to content

Ensure that AVX512_MoveMask and other key mask internal mask intrinsics are hoistable#126910

Merged
tannergooding merged 2 commits intodotnet:mainfrom
tannergooding:improve-kmask128-256
Apr 15, 2026
Merged

Ensure that AVX512_MoveMask and other key mask internal mask intrinsics are hoistable#126910
tannergooding merged 2 commits intodotnet:mainfrom
tannergooding:improve-kmask128-256

Conversation

@tannergooding
Copy link
Copy Markdown
Member

Previously these intrinsics were marked as HW_Category_Special which blocks CSE and therefore loop hoisting. However, these are simple operations that are invariant and so should be allowed to be CSE'd and hoisted.

So we correct the category to be HW_Category_SimpleSIMD, matching what's used for the other equivalent APIs, and mark them as HW_Flag_SpecialCodeGen instead to ensure that everything works as expected.

This improves the codegen particularly for V128/V256 when ExtractMostSignificantBits or similar is used in a scenario where CSE was possible, as it ensures we don't unnecessarily use EVEX instructions when VEX encoded would've been a smaller and faster choice.

Copilot AI review requested due to automatic review settings April 14, 2026 21:24
@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 14, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@tannergooding tannergooding marked this pull request as ready for review April 14, 2026 21:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR updates the classification of several AVX512 mask-related intrinsics so they can participate in CSE and be loop-hoisted, while preserving their special handling during code generation.

Changes:

  • Reclassified AVX512.MoveMask from HW_Category_Special to HW_Category_SimpleSIMD and added special handling flags.
  • Reclassified AVX512 mask logical intrinsics (e.g., AddMask, AndMask, NotMask, XorMask) from HW_Category_Special to HW_Category_SimpleSIMD.
  • Added HW_Flag_SpecialCodeGen to ensure specialized lowering/codegen remains in place after re-categorization.

Comment thread src/coreclr/jit/hwintrinsiclistxarch.h
Comment thread src/coreclr/jit/hwintrinsiclistxarch.h
Comment thread src/coreclr/jit/hwintrinsiclistxarch.h
Comment thread src/coreclr/jit/hwintrinsiclistxarch.h
@tannergooding tannergooding requested review from EgorBo and kg April 14, 2026 21:27
@tannergooding
Copy link
Copy Markdown
Member Author

CC. @dotnet/jit-contrib, @EgorBo, @kg for review

This handles a few cases like this:

- vpmovb2m  k1, xmm0
- kmovw     ecx, k1
+ vpmovmskb ecx, xmm0
  test      ecx, ecx

As it ensures the MoveMask and ConvertVectorMask are hoisted together, allowing rationalization to see that we can consume the input directly as a vector:

*  HWINTRINSIC int    16 ubyte MoveMask <l:$181, c:$182>
\--*  HWINTRINSIC mask   16 ubyte ConvertVectorToMask <l:$300, c:$301>
   \--*  IND       simd16 <l:$240, c:$280>
      \--*  LCL_VAR   byref  V00 arg0         u:1 $80

@tannergooding tannergooding merged commit 8e0096f into dotnet:main Apr 15, 2026
136 of 139 checks passed
@tannergooding tannergooding deleted the improve-kmask128-256 branch April 15, 2026 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants