stabilize s390x vector registers#154184
Conversation
This comment has been minimized.
This comment has been minimized.
2986c81 to
ba604f5
Compare
This comment has been minimized.
This comment has been minimized.
ba604f5 to
e182fef
Compare
| } | ||
| } | ||
| Self::vreg => types! { | ||
| vector: I32, F16, F32, I64, F64, I128, F128, |
There was a problem hiding this comment.
I've been looking at the z/Arch vector docs and there only seem to be instructions that load a full 128-bit into a vector register. I don't think it makes sense to expose smaller sizes here.
Specifically you only have the options of:
- loading a full 128 bits from memory
- loading a single element and replicating it over the 128 bits of the vector register
- loading a single element and inserting it inside an existing vector
This is in contrast to AArch64 for example which does have instructions that load a single element and zero-extend it to the full size of the vector register.
As such I'm inclined to remove i32/i64/f16/f32/f64 from the set of allowed types.
There was a problem hiding this comment.
To be clear, that would deviate from what clang/llvm accept. I'll leave it up to the target maintainer(s) to argue one way or the other.
Strictly speaking the behavior as implemented is correct according to the reference:
The allocated register will contain the value of
<expr>at the start of the assembly code.
So that should be clarified if we remove support for the smaller scalar types.
There was a problem hiding this comment.
It might not make much sense as inputs, but since s390x has shuffle instructions, I think it could be meaningful as outputs.
e.g., by combining shuffle with normal vector addition, we can place the sum of all lanes in the lower lane.
(Some of _mm512_reduce_* in x86_64 are actually implemented in this way: https://godbolt.org/z/b5G6qKzns)
There was a problem hiding this comment.
We absolutely need floating-point types in vector registers. VRs overlap the FPRs, and there's a whole set of scalar floating-point instructions operating on vector registers, using it basically as a larger FPR set. There's no need to extend anything, the (shorter) floating-point value sits at a defined location inside the VR, with the remaining lanes being ignored by the relevant instructions. The existing load/store operations work just fine for those. (There actually are VECTOR LOAD LOGICAL ELEMENT AND ZERO (VLLEZ) instructions that clear the other lanes as well, but it's generally not necessary to use them.)
The shorter integer types are less important, but would still be good to have, in particular for the (scalar) integer-to-float / float-to-integer conversion instructions that operate on VRs. (Again, those integer values would sit at a defined lane - normally the same as a floating-point value of the same size.)
All this is the same as GCC and LLVM handle inline asm with those types in VRs.
There was a problem hiding this comment.
In that case I withdraw my objection. Would it make sense to add i8/i16 as well then for consistency? These are special cased on x86 because x86 doesn't have instructions that move an i8/i16 into a vector register.
tracking issue: #133416
reference PR: rust-lang/reference#2215
Stabilizes s390x vector registers, e.g.
The types that are accepted for vreg registers are
f16,f32,f64,f128i32,i64andi128and their unsigned counterpartsi8x16,i16x8,i32x4,i64x2and their unsigned counterpartsf16x8,f32x4andf64x2Support for all of these is tested in https://github.com/rust-lang/rust/blob/main/tests/assembly-llvm/asm/s390x-types.rs, and the types correspond with the LLVM definition in https://github.com/llvm/llvm-project/blob/df9eb79970c012990e829d174d181d575d414efe/llvm/lib/Target/SystemZ/SystemZRegisterInfo.td#L312-L339
The
f16,f16x8andf128types are unstable, and so can't be used on stable in practice. They do show up in some error messages though.vregwas previously only accepted as a clobber.Currently the vector types in
core::arch::s390xare still unstable. Separately stabilizingvregis still useful because scalar types can also be put intovregs.Implementation history
f16inline ASM support for s390x #150826cc @uweigand @taiki-e
r? @Amanieu