Improve BMI2 MultiplyNoFlags APIs#21362
Conversation
| /// The above native signature does not directly correspond to the managed signature. | ||
| /// This intrinisc is only available on 64-bit processes | ||
| /// </summary> | ||
| public static ulong MultiplyNoFlags(ulong left, ulong right) { throw new PlatformNotSupportedException(); } |
There was a problem hiding this comment.
nit: Might be nice to order this above the one that takes 3 parameters.
| /// The above native signature does not directly correspond to the managed signature. | ||
| /// </summary> | ||
| public static unsafe uint MultiplyNoFlags(uint left, uint right, uint* high) { throw new PlatformNotSupportedException(); } | ||
| public static unsafe uint MultiplyNoFlags(uint left, uint right, uint* low) { throw new PlatformNotSupportedException(); } |
There was a problem hiding this comment.
Erm, implementing this intrinsic will be fun.
There was a problem hiding this comment.
I think it's the same concern raised in the other thread, in that (like with Div/Rem) we don't necessarily have an easy way to return multiple results from a single instruction.
I think the "inefficient" codegen might be easier, but it will require an explicit store to memory as part of the codegen (where-as the efficient one would let us keep it in register/etc).
There was a problem hiding this comment.
think the "inefficient" codegen might be easier, but it will require an explicit store to
The perf of "explicit store" would be fine, uint/ulong can be handled by store forward on Intel CPUs. And we have provided "no store" version, which avoids additional stores when users do not need "low".
There was a problem hiding this comment.
This intrinsic both stores to memory and returns a value. That's a bit unusual, I think the only similar opers are the atomic ones. We'll see how it goes.
There was a problem hiding this comment.
Well, it's still kind of useless, or better said inconvenient. This particular functionality exists on any x86/x64 CPU, but it is exposed only as part of BMI2
The BMI2 specific instruction, which doesn't set any CPU flags will be part of BMI2 exclusively.
There is a separate discussion about exposing additional Base intrinsics (for things like mul, div, etc) that needs a proposal, review, etc
- We probably want "general-purpose" versions exposed in
System.Math(and for existing methods to be special-cases, where applicable). However, there are still special semantics to the x86 instructions that may not be generally applicable and for which exposing intrinsics would still be desirable
There was a problem hiding this comment.
Also, to be clear, the Base intrinsics would not apply to just any instruction; but for ones which expose special semantics that are desirable for high-perf scenarios (much as is done for the C/C++ intrinsics that fit this category)
There was a problem hiding this comment.
The BMI2 specific instruction, which doesn't set any CPU flags will be part of BMI2 exclusively.
Apparently you are having trouble understanding that nobody needs or asked for the "no flags" functionality while everyone asked for the broadly available 128 bit functionality. And instead of that they've got the somewhat less available BMI2 form.
There was a problem hiding this comment.
while everyone asked for the broadly available 128 bit functionality
No, I completely understood that users have wanted the more broadly available functionality, which is why I mentioned the existing discussions that have been had for the other requested functionality.
And instead of that they've got the somewhat less available BMI2 form.
Yes, they will get this initially, because it is part of the already reviewed/approved API set.
Adding APIs for things like __addcarry, _bittest, _mul128, _mulh, etc... were not part of the initially reviewed/approved API set and have to go through the process independently (and are tracked by issues such as https://github.com/dotnet/corefx/issues/32075, and others which are scattered about).
There was a problem hiding this comment.
Yes, they will get this initially, because it is part of the already reviewed/approved API set.
Then something went wrong somewhere with the review/approve/prioritize process.
|
Pushed the CoreFX change in dotnet/corefx#33805 because the change relies on it. @tannergooding Is it okay? |
|
That's fine with me. The pump still appears to be blocked so I think everything should line up. |
| /// <summary> | ||
| /// unsigned int _mulx_u32 (unsigned int a, unsigned int b, unsigned int* hi) | ||
| /// MULX r32a, r32b, reg/m32 | ||
| /// The above native signature does not directly correspond to the managed signature. |
There was a problem hiding this comment.
Is it me or the comment above is wrong? It shows a unsigned int* hi parameter.
There was a problem hiding this comment.
This PR changes the APIs to return “high”, which is different from C++.
|
@tannergooding Can we merge this PR? |
Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Improve BMI2 MultiplyNoFlags APIs (dotnet/coreclr#21362) Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com> * Add AsyncIteratorStateMachineAttribute Exactly follows the design of AsyncStateMachineAttribute and IteratorStateMachineAttribute; the only thing different is the type name, "AsyncIterator" instead of "Async" and "Iterator". Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com> * Address PR feedback Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
| @@ -33,9 +33,18 @@ internal X64() { } | |||
| /// <summary> | |||
| /// unsigned __int64 _mulx_u64 (unsigned __int64 a, unsigned __int64 b, unsigned __int64* hi) | |||
| /// MULX r64a, r64b, reg/m64 | |||
There was a problem hiding this comment.
This could be better expressed as MULX r64a, r64a, reg/m64 (both src1 and dest being the same register) as that is the actual instruction that will return only the high part (which should be called out here in docs).
* Improve BMI2 MultiplyNoFlags APIs (dotnet/coreclr#21362) Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com> * Add AsyncIteratorStateMachineAttribute Exactly follows the design of AsyncStateMachineAttribute and IteratorStateMachineAttribute; the only thing different is the type name, "AsyncIterator" instead of "Async" and "Iterator". Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com> * Address PR feedback Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Improve BMI2 MultiplyNoFlags APIs (dotnet/coreclr#21362) Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com> * Add AsyncIteratorStateMachineAttribute Exactly follows the design of AsyncStateMachineAttribute and IteratorStateMachineAttribute; the only thing different is the type name, "AsyncIterator" instead of "Async" and "Iterator". Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com> * Address PR feedback Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com> (cherry picked from commit 1c47747)
* Improve BMI2 MultiplyNoFlags APIs (dotnet/coreclr#21362) Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com> * Add AsyncIteratorStateMachineAttribute Exactly follows the design of AsyncStateMachineAttribute and IteratorStateMachineAttribute; the only thing different is the type name, "AsyncIterator" instead of "Async" and "Iterator". Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com> * Address PR feedback Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com> (cherry picked from commit 1c47747)
* Improve BMI2 MultiplyNoFlags APIs (dotnet/coreclr#21362) Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com> * Add AsyncIteratorStateMachineAttribute Exactly follows the design of AsyncStateMachineAttribute and IteratorStateMachineAttribute; the only thing different is the type name, "AsyncIterator" instead of "Async" and "Iterator". Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com> * Address PR feedback Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com> (cherry picked from commit 1c47747)
Commit migrated from dotnet/coreclr@2217719
Close https://github.com/dotnet/corefx/issues/33615
@CarolEidt @tannergooding @eerhardt