Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Improve BMI2 MultiplyNoFlags APIs#21362

Merged
tannergooding merged 1 commit into
dotnet:masterfrom
fiigii:bmiapi
Dec 5, 2018
Merged

Improve BMI2 MultiplyNoFlags APIs#21362
tannergooding merged 1 commit into
dotnet:masterfrom
fiigii:bmiapi

Conversation

@fiigii
Copy link
Copy Markdown

@fiigii fiigii commented Dec 4, 2018

Copy link
Copy Markdown
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

/// The above native signature does not directly correspond to the managed signature.
/// This intrinisc is only available on 64-bit processes
/// </summary>
public static ulong MultiplyNoFlags(ulong left, ulong right) { throw new PlatformNotSupportedException(); }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Might be nice to order this above the one that takes 3 parameters.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

/// The above native signature does not directly correspond to the managed signature.
/// </summary>
public static unsafe uint MultiplyNoFlags(uint left, uint right, uint* high) { throw new PlatformNotSupportedException(); }
public static unsafe uint MultiplyNoFlags(uint left, uint right, uint* low) { throw new PlatformNotSupportedException(); }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erm, implementing this intrinsic will be fun.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikedn Concerns?

Copy link
Copy Markdown
Member

@tannergooding tannergooding Dec 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the same concern raised in the other thread, in that (like with Div/Rem) we don't necessarily have an easy way to return multiple results from a single instruction.

I think the "inefficient" codegen might be easier, but it will require an explicit store to memory as part of the codegen (where-as the efficient one would let us keep it in register/etc).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think the "inefficient" codegen might be easier, but it will require an explicit store to

The perf of "explicit store" would be fine, uint/ulong can be handled by store forward on Intel CPUs. And we have provided "no store" version, which avoids additional stores when users do not need "low".

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This intrinsic both stores to memory and returns a value. That's a bit unusual, I think the only similar opers are the atomic ones. We'll see how it goes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's still kind of useless, or better said inconvenient. This particular functionality exists on any x86/x64 CPU, but it is exposed only as part of BMI2

The BMI2 specific instruction, which doesn't set any CPU flags will be part of BMI2 exclusively.

There is a separate discussion about exposing additional Base intrinsics (for things like mul, div, etc) that needs a proposal, review, etc

  • We probably want "general-purpose" versions exposed in System.Math (and for existing methods to be special-cases, where applicable). However, there are still special semantics to the x86 instructions that may not be generally applicable and for which exposing intrinsics would still be desirable

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, to be clear, the Base intrinsics would not apply to just any instruction; but for ones which expose special semantics that are desirable for high-perf scenarios (much as is done for the C/C++ intrinsics that fit this category)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BMI2 specific instruction, which doesn't set any CPU flags will be part of BMI2 exclusively.

Apparently you are having trouble understanding that nobody needs or asked for the "no flags" functionality while everyone asked for the broadly available 128 bit functionality. And instead of that they've got the somewhat less available BMI2 form.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while everyone asked for the broadly available 128 bit functionality

No, I completely understood that users have wanted the more broadly available functionality, which is why I mentioned the existing discussions that have been had for the other requested functionality.

And instead of that they've got the somewhat less available BMI2 form.

Yes, they will get this initially, because it is part of the already reviewed/approved API set.

Adding APIs for things like __addcarry, _bittest, _mul128, _mulh, etc... were not part of the initially reviewed/approved API set and have to go through the process independently (and are tracked by issues such as https://github.com/dotnet/corefx/issues/32075, and others which are scattered about).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they will get this initially, because it is part of the already reviewed/approved API set.

Then something went wrong somewhere with the review/approve/prioritize process.

@fiigii
Copy link
Copy Markdown
Author

fiigii commented Dec 4, 2018

Pushed the CoreFX change in dotnet/corefx#33805 because the change relies on it. @tannergooding Is it okay?

@tannergooding
Copy link
Copy Markdown
Member

That's fine with me. The pump still appears to be blocked so I think everything should line up.

/// <summary>
/// unsigned int _mulx_u32 (unsigned int a, unsigned int b, unsigned int* hi)
/// MULX r32a, r32b, reg/m32
/// The above native signature does not directly correspond to the managed signature.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it me or the comment above is wrong? It shows a unsigned int* hi parameter.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR changes the APIs to return “high”, which is different from C++.

@fiigii
Copy link
Copy Markdown
Author

fiigii commented Dec 5, 2018

@tannergooding Can we merge this PR?

@tannergooding tannergooding merged commit 2217719 into dotnet:master Dec 5, 2018
@fiigii fiigii deleted the bmiapi branch December 5, 2018 00:32
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corefx that referenced this pull request Dec 5, 2018
Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corefx that referenced this pull request Dec 5, 2018
Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corert that referenced this pull request Dec 5, 2018
Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
stephentoub pushed a commit to dotnet/corert that referenced this pull request Dec 5, 2018
Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
danmoseley pushed a commit to dotnet/corefx that referenced this pull request Dec 5, 2018
* Improve BMI2 MultiplyNoFlags APIs (dotnet/coreclr#21362)

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

* Add AsyncIteratorStateMachineAttribute

Exactly follows the design of AsyncStateMachineAttribute and IteratorStateMachineAttribute; the only thing different is the type name, "AsyncIterator" instead of "Async" and "Iterator".

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

* Address PR feedback

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
@@ -33,9 +33,18 @@ internal X64() { }
/// <summary>
/// unsigned __int64 _mulx_u64 (unsigned __int64 a, unsigned __int64 b, unsigned __int64* hi)
/// MULX r64a, r64b, reg/m64
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be better expressed as MULX r64a, r64a, reg/m64 (both src1 and dest being the same register) as that is the actual instruction that will return only the high part (which should be called out here in docs).

jlennox pushed a commit to jlennox/corefx that referenced this pull request Dec 16, 2018
* Improve BMI2 MultiplyNoFlags APIs (dotnet/coreclr#21362)

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

* Add AsyncIteratorStateMachineAttribute

Exactly follows the design of AsyncStateMachineAttribute and IteratorStateMachineAttribute; the only thing different is the type name, "AsyncIterator" instead of "Async" and "Iterator".

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

* Address PR feedback

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
baulig pushed a commit to baulig/corefx that referenced this pull request Jan 28, 2019
* Improve BMI2 MultiplyNoFlags APIs (dotnet/coreclr#21362)

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

* Add AsyncIteratorStateMachineAttribute

Exactly follows the design of AsyncStateMachineAttribute and IteratorStateMachineAttribute; the only thing different is the type name, "AsyncIterator" instead of "Async" and "Iterator".

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

* Address PR feedback

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
(cherry picked from commit 1c47747)
baulig pushed a commit to mono/corefx that referenced this pull request Jan 29, 2019
* Improve BMI2 MultiplyNoFlags APIs (dotnet/coreclr#21362)

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

* Add AsyncIteratorStateMachineAttribute

Exactly follows the design of AsyncStateMachineAttribute and IteratorStateMachineAttribute; the only thing different is the type name, "AsyncIterator" instead of "Async" and "Iterator".

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

* Address PR feedback

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
(cherry picked from commit 1c47747)
baulig pushed a commit to mono/corefx that referenced this pull request Feb 4, 2019
* Improve BMI2 MultiplyNoFlags APIs (dotnet/coreclr#21362)

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

* Add AsyncIteratorStateMachineAttribute

Exactly follows the design of AsyncStateMachineAttribute and IteratorStateMachineAttribute; the only thing different is the type name, "AsyncIterator" instead of "Async" and "Iterator".

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

* Address PR feedback

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
(cherry picked from commit 1c47747)
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants