Skip to content

Conversation

@kunalspathak
Copy link
Contributor

  • For FMA, handle case for FMA where falseReg == embMaskOp1Reg
  • Workaround around mov when used as alias for sel because predicateRegister/vectorRegister are same
  • When intrinsic is wrapped in ConditionalSelect, check the intrinsic flag if it needs low register mask
  • Various APIs had missing flag for HW_Flag_LowMaskedOperation

There are still some functional failures with JitStress/JitStressRegs, but wanted to send this out.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 22, 2024
@kunalspathak
Copy link
Contributor Author

@dotnet/arm64-contrib

@kunalspathak kunalspathak requested a review from TIHan May 22, 2024 05:55
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

if falseReg == embMaskOp2Reg, we simply generate:

```
            sel     z16.s, p7, z9.s, z10.s
            mla     z16.s, p7/m, z10.s, z11.s
```

Here `z10` holds `falseReg` and `embMaskOp2Reg`.
@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label May 22, 2024
Copy link
Contributor

@a74nh a74nh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Glad we are catching these.

steveharter pushed a commit to steveharter/runtime that referenced this pull request May 28, 2024
* handle case for FMA where falseReg == embMaskOp1Reg

* workaround because predicateRegister/vectorRegister are same

* When intrinsic is wrapped in ConditionalSelect, make sure to check its LOW_PREDICATE flag

* Mark AddAcross with HW_Flag_LowMaskedOperation

* double check if ConditionalSelect's op2 is hwintrinsic

* Mark Max with HW_Flag_LowMaskedOperation

* Mark MaxAcross with HW_Flag_LowMaskedOperation

* Mark MinNumber/MaxNumber/MinNumberAcross/MaxNumberAcross with HW_Flag_LowMaskedOperation

* Mark Min/MinAcross with HW_Flag_LowMaskedOperation

* fix the workaround for predicate vs. vector register

* fix the predicate mask preference

* Introduce INS_SCALABLE_OPTS_PREDICATE_MERGE_MOV

* jit format

* revert change to csproj

* remove assert that can not happen for FMA

if falseReg == embMaskOp2Reg, we simply generate:

```
            sel     z16.s, p7, z9.s, z10.s
            mla     z16.s, p7/m, z10.s, z11.s
```

Here `z10` holds `falseReg` and `embMaskOp2Reg`.

* revert a condition added for workaround of predicate == vector register

* remove the extra comment
Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this pull request May 30, 2024
* handle case for FMA where falseReg == embMaskOp1Reg

* workaround because predicateRegister/vectorRegister are same

* When intrinsic is wrapped in ConditionalSelect, make sure to check its LOW_PREDICATE flag

* Mark AddAcross with HW_Flag_LowMaskedOperation

* double check if ConditionalSelect's op2 is hwintrinsic

* Mark Max with HW_Flag_LowMaskedOperation

* Mark MaxAcross with HW_Flag_LowMaskedOperation

* Mark MinNumber/MaxNumber/MinNumberAcross/MaxNumberAcross with HW_Flag_LowMaskedOperation

* Mark Min/MinAcross with HW_Flag_LowMaskedOperation

* fix the workaround for predicate vs. vector register

* fix the predicate mask preference

* Introduce INS_SCALABLE_OPTS_PREDICATE_MERGE_MOV

* jit format

* revert change to csproj

* remove assert that can not happen for FMA

if falseReg == embMaskOp2Reg, we simply generate:

```
            sel     z16.s, p7, z9.s, z10.s
            mla     z16.s, p7/m, z10.s, z11.s
```

Here `z10` holds `falseReg` and `embMaskOp2Reg`.

* revert a condition added for workaround of predicate == vector register

* remove the extra comment
@github-actions github-actions bot locked and limited conversation to collaborators Jun 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants