JIT: Handle edge cases around BlendVariableMask rewrite#126062
Merged
tannergooding merged 6 commits intodotnet:mainfrom Apr 24, 2026
Merged
JIT: Handle edge cases around BlendVariableMask rewrite#126062tannergooding merged 6 commits intodotnet:mainfrom
tannergooding merged 6 commits intodotnet:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates JIT rationalization for xarch NI_AVX512_BlendVariableMask to avoid keeping the mask-form blend when the mask operand originates from a vector-to-mask conversion, preventing a deoptimization where a mask is created only to be embedded.
Changes:
- Extend
RewriteHWIntrinsicBlendvto detect when the blend mask is produced viaNI_AVX512_ConvertVectorToMask. - Avoid the “keep embedded mask” early-return in that scenario so the blend can be rewritten back to the non-mask form.
Comments suppressed due to low confidence (1)
src/coreclr/jit/rationalize.cpp:685
op3is now a localGenTree*, but it is later passed by address toRewriteHWIntrinsicToNonMask(&op3, ...).RewriteHWIntrinsicToNonMaskexpectsuseto be the actual operand edge so it can replace/remove nodes (e.g., it removesNI_AVX512_ConvertVectorToMaskand updates the parent viaReplaceOperand). Passing a local pointer meansnode->Op(3)will not be updated, leaving the blend node still pointing at the removed intrinsic (dangling operand / miscompile). UseGenTree*& op3 = node->Op(3);(or otherwise pass&node->Op(3)/ an operand reference) when callingRewriteHWIntrinsicToNonMask.
GenTree* op2 = node->Op(2);
GenTree* op3 = node->Op(3);
// We're in the post-order visit and are traversing in execution order, so
// everything between op2 and node will have already been rewritten to LIR
// form and doing the IsInvariantInRange check is safe. This allows us to
// catch cases where something is embedded masking compatible but where we
// could never actually contain it and so we want to rewrite it to the non-mask
// variant
SideEffectSet scratchSideEffects;
if (scratchSideEffects.IsLirInvariantInRange(m_compiler, op2, node))
{
unsigned tgtMaskSize = simdSize / genTypeSize(simdBaseType);
var_types tgtSimdBaseType = TYP_UNDEF;
if (op2->isEmbeddedMaskingCompatible(m_compiler, tgtMaskSize, tgtSimdBaseType))
{
// Make sure we had a mask to begin with. We don't want to create a mask
// solely for the purpose of embedding it.
if (!op3->OperIsHWIntrinsic() ||
(op3->AsHWIntrinsic()->GetHWIntrinsicId() != NI_AVX512_ConvertVectorToMask))
{
// We are going to utilize the embedded mask, so we don't need to rewrite. However,
// we want to fixup the simdBaseType here since it simplifies lowering and allows
// both embedded broadcast and the mask to be live simultaneously.
if (tgtSimdBaseType != TYP_UNDEF)
{
op2->AsHWIntrinsic()->SetSimdBaseType(tgtSimdBaseType);
}
return;
}
}
}
if (!ShouldRewriteToNonMaskHWIntrinsic(op3))
{
return;
}
parents.Push(op3);
RewriteHWIntrinsicToNonMask(&op3, parents);
(void)parents.Pop();
Member
Author
|
cc @dotnet/jit-contrib SPMI doesn't show any diffs, but this does fix up my motivating case. Something like: static Vector128<float> AddToNegative(Vector128<float> v1, Vector128<float> v2)
=> Sse41.BlendVariable(v1, v1 + v2, v1); vmovups xmm0, xmmword ptr [rdx]
- vpmovd2m k1, xmm0
- vaddps xmm0 {k1}, xmm0, xmmword ptr [r8]
+ vaddps xmm1, xmm0, xmmword ptr [r8]
+ vblendvps xmm0, xmm0, xmm1, xmm0
vmovups xmmword ptr [rcx], xmm0
mov rax, rcx
ret
-; Total bytes of code: 24
+; Total bytes of code: 23 |
Member
|
@tannergooding, @kg PTAL. |
Member
Author
|
Few diffs now with the bug fix included. |
This was referenced Apr 23, 2026
tannergooding
approved these changes
Apr 23, 2026
Member
EgorBo
approved these changes
Apr 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #127260
This deals with some edge cases related to
BlendVariablebeing 'upgraded' toBlendVariableMaskon import and then possibly 'downgraded' to its original form inrationalize.First, we were improperly rewriting some blends back to
BlendVariablewhen the mask granularity was incompatible with thepblendvbinstruction.Second, there is logic in place that checks whether the blend could be used as an EVEX embedded mask and rewrites back to
BlendVariableif not. However, it misses cases where the mask is created from a vector anyway, and creating the mask just to embed it is a deoptimization.