Skip to content

JIT: Handle edge cases around BlendVariableMask rewrite#126062

Merged
tannergooding merged 6 commits intodotnet:mainfrom
saucecontrol:less-mask
Apr 24, 2026
Merged

JIT: Handle edge cases around BlendVariableMask rewrite#126062
tannergooding merged 6 commits intodotnet:mainfrom
saucecontrol:less-mask

Conversation

@saucecontrol
Copy link
Copy Markdown
Member

@saucecontrol saucecontrol commented Mar 24, 2026

Fixes #127260

This deals with some edge cases related to BlendVariable being 'upgraded' to BlendVariableMask on import and then possibly 'downgraded' to its original form in rationalize.

First, we were improperly rewriting some blends back to BlendVariable when the mask granularity was incompatible with the pblendvb instruction.

Second, there is logic in place that checks whether the blend could be used as an EVEX embedded mask and rewrites back to BlendVariable if not. However, it misses cases where the mask is created from a vector anyway, and creating the mask just to embed it is a deoptimization.

Copilot AI review requested due to automatic review settings March 24, 2026 23:11
@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 24, 2026
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Mar 24, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates JIT rationalization for xarch NI_AVX512_BlendVariableMask to avoid keeping the mask-form blend when the mask operand originates from a vector-to-mask conversion, preventing a deoptimization where a mask is created only to be embedded.

Changes:

  • Extend RewriteHWIntrinsicBlendv to detect when the blend mask is produced via NI_AVX512_ConvertVectorToMask.
  • Avoid the “keep embedded mask” early-return in that scenario so the blend can be rewritten back to the non-mask form.
Comments suppressed due to low confidence (1)

src/coreclr/jit/rationalize.cpp:685

  • op3 is now a local GenTree*, but it is later passed by address to RewriteHWIntrinsicToNonMask(&op3, ...). RewriteHWIntrinsicToNonMask expects use to be the actual operand edge so it can replace/remove nodes (e.g., it removes NI_AVX512_ConvertVectorToMask and updates the parent via ReplaceOperand). Passing a local pointer means node->Op(3) will not be updated, leaving the blend node still pointing at the removed intrinsic (dangling operand / miscompile). Use GenTree*& op3 = node->Op(3); (or otherwise pass &node->Op(3) / an operand reference) when calling RewriteHWIntrinsicToNonMask.
    GenTree* op2 = node->Op(2);
    GenTree* op3 = node->Op(3);

    // We're in the post-order visit and are traversing in execution order, so
    // everything between op2 and node will have already been rewritten to LIR
    // form and doing the IsInvariantInRange check is safe. This allows us to
    // catch cases where something is embedded masking compatible but where we
    // could never actually contain it and so we want to rewrite it to the non-mask
    // variant
    SideEffectSet scratchSideEffects;

    if (scratchSideEffects.IsLirInvariantInRange(m_compiler, op2, node))
    {
        unsigned  tgtMaskSize     = simdSize / genTypeSize(simdBaseType);
        var_types tgtSimdBaseType = TYP_UNDEF;

        if (op2->isEmbeddedMaskingCompatible(m_compiler, tgtMaskSize, tgtSimdBaseType))
        {
            // Make sure we had a mask to begin with. We don't want to create a mask
            // solely for the purpose of embedding it.

            if (!op3->OperIsHWIntrinsic() ||
                (op3->AsHWIntrinsic()->GetHWIntrinsicId() != NI_AVX512_ConvertVectorToMask))
            {
                // We are going to utilize the embedded mask, so we don't need to rewrite. However,
                // we want to fixup the simdBaseType here since it simplifies lowering and allows
                // both embedded broadcast and the mask to be live simultaneously.

                if (tgtSimdBaseType != TYP_UNDEF)
                {
                    op2->AsHWIntrinsic()->SetSimdBaseType(tgtSimdBaseType);
                }
                return;
            }
        }
    }

    if (!ShouldRewriteToNonMaskHWIntrinsic(op3))
    {
        return;
    }

    parents.Push(op3);
    RewriteHWIntrinsicToNonMask(&op3, parents);
    (void)parents.Pop();

@saucecontrol
Copy link
Copy Markdown
Member Author

cc @dotnet/jit-contrib

SPMI doesn't show any diffs, but this does fix up my motivating case. Something like:

static Vector128<float> AddToNegative(Vector128<float> v1, Vector128<float> v2)
    => Sse41.BlendVariable(v1, v1 + v2, v1);
    vmovups  xmm0, xmmword ptr [rdx]
-   vpmovd2m k1, xmm0
-   vaddps   xmm0 {k1}, xmm0, xmmword ptr [r8]
+   vaddps   xmm1, xmm0, xmmword ptr [r8]
+   vblendvps xmm0, xmm0, xmm1, xmm0
    vmovups  xmmword ptr [rcx], xmm0
    mov      rax, rcx
    ret      
-; Total bytes of code: 24
+; Total bytes of code: 23

@JulieLeeMSFT
Copy link
Copy Markdown
Member

@tannergooding, @kg PTAL.

@JulieLeeMSFT JulieLeeMSFT requested a review from kg April 20, 2026 14:51
Copilot AI review requested due to automatic review settings April 20, 2026 14:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread src/coreclr/jit/rationalize.cpp Outdated
Comment thread src/coreclr/jit/rationalize.cpp Outdated
Comment thread src/coreclr/jit/rationalize.cpp Outdated
@dotnet-policy-service dotnet-policy-service Bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 23, 2026
@saucecontrol saucecontrol changed the title JIT: Rewrite BlendVariableMask when mask is created from vector JIT: Handle edge cases around BlendVariableMask rewrite Apr 23, 2026
@saucecontrol
Copy link
Copy Markdown
Member Author

Few diffs now with the bug fix included.

Comment thread src/coreclr/jit/rationalize.cpp Outdated
Copilot AI review requested due to automatic review settings April 23, 2026 20:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread src/tests/JIT/Regression/JitBlue/Runtime_127260/Runtime_127260.cs Outdated
Comment thread src/tests/JIT/Regression/JitBlue/Runtime_127260/Runtime_127260.cs
@tannergooding
Copy link
Copy Markdown
Member

CC. @dotnet/jit-contrib, @EgorBo, @kg this should be ready for secondary review. Fixes an issue with the BlendVariableMask -> BlendVariable rewriting and ensures its used in cases where it would be more optimal.

@tannergooding tannergooding merged commit a75526a into dotnet:main Apr 24, 2026
137 of 141 checks passed
@saucecontrol saucecontrol deleted the less-mask branch April 24, 2026 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JIT: Invalid codegen for Avx512F.VL.BlendVariable and Avx512BW.VL.BlendVariable for integral types larger than byte

6 participants