-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
stabilize s390x vector registers #154184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
folkertdev
wants to merge
1
commit into
rust-lang:main
Choose a base branch
from
folkertdev:stabilize-s390x-vector-registers
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+127
−674
Open
stabilize s390x vector registers #154184
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been looking at the z/Arch vector docs and there only seem to be instructions that load a full 128-bit into a vector register. I don't think it makes sense to expose smaller sizes here.
Specifically you only have the options of:
This is in contrast to AArch64 for example which does have instructions that load a single element and zero-extend it to the full size of the vector register.
As such I'm inclined to remove i32/i64/f16/f32/f64 from the set of allowed types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, that would deviate from what
clang/llvmaccept. I'll leave it up to the target maintainer(s) to argue one way or the other.Strictly speaking the behavior as implemented is correct according to the reference:
https://doc.rust-lang.org/nightly/reference/inline-assembly.html?highlight=assemb#r-asm.operand-type.supported-operands.in
So that should be clarified if we remove support for the smaller scalar types.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not make much sense as inputs, but since s390x has shuffle instructions, I think it could be meaningful as outputs.
e.g., by combining shuffle with normal vector addition, we can place the sum of all lanes in the lower lane.
(Some of
_mm512_reduce_*in x86_64 are actually implemented in this way: https://godbolt.org/z/b5G6qKzns)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We absolutely need floating-point types in vector registers. VRs overlap the FPRs, and there's a whole set of scalar floating-point instructions operating on vector registers, using it basically as a larger FPR set. There's no need to extend anything, the (shorter) floating-point value sits at a defined location inside the VR, with the remaining lanes being ignored by the relevant instructions. The existing load/store operations work just fine for those. (There actually are VECTOR LOAD LOGICAL ELEMENT AND ZERO (VLLEZ) instructions that clear the other lanes as well, but it's generally not necessary to use them.)
The shorter integer types are less important, but would still be good to have, in particular for the (scalar) integer-to-float / float-to-integer conversion instructions that operate on VRs. (Again, those integer values would sit at a defined lane - normally the same as a floating-point value of the same size.)
All this is the same as GCC and LLVM handle inline asm with those types in VRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I withdraw my objection. Would it make sense to add i8/i16 as well then for consistency? These are special cased on x86 because x86 doesn't have instructions that move an i8/i16 into a vector register.