Ensure that AVX512_MoveMask and other key mask internal mask intrinsics are hoistable#126910
Merged
tannergooding merged 2 commits intodotnet:mainfrom Apr 15, 2026
Merged
Conversation
Contributor
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR updates the classification of several AVX512 mask-related intrinsics so they can participate in CSE and be loop-hoisted, while preserving their special handling during code generation.
Changes:
- Reclassified
AVX512.MoveMaskfromHW_Category_SpecialtoHW_Category_SimpleSIMDand added special handling flags. - Reclassified AVX512 mask logical intrinsics (e.g.,
AddMask,AndMask,NotMask,XorMask) fromHW_Category_SpecialtoHW_Category_SimpleSIMD. - Added
HW_Flag_SpecialCodeGento ensure specialized lowering/codegen remains in place after re-categorization.
Member
Author
|
CC. @dotnet/jit-contrib, @EgorBo, @kg for review This handles a few cases like this: - vpmovb2m k1, xmm0
- kmovw ecx, k1
+ vpmovmskb ecx, xmm0
test ecx, ecxAs it ensures the |
EgorBo
approved these changes
Apr 14, 2026
This was referenced Apr 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously these intrinsics were marked as
HW_Category_Specialwhich blocks CSE and therefore loop hoisting. However, these are simple operations that are invariant and so should be allowed to be CSE'd and hoisted.So we correct the category to be
HW_Category_SimpleSIMD, matching what's used for the other equivalent APIs, and mark them asHW_Flag_SpecialCodeGeninstead to ensure that everything works as expected.This improves the codegen particularly for V128/V256 when
ExtractMostSignificantBitsor similar is used in a scenario where CSE was possible, as it ensures we don't unnecessarily use EVEX instructions when VEX encoded would've been a smaller and faster choice.