Skip to content

stabilize s390x vector registers#154184

Open
folkertdev wants to merge 1 commit intorust-lang:mainfrom
folkertdev:stabilize-s390x-vector-registers
Open

stabilize s390x vector registers#154184
folkertdev wants to merge 1 commit intorust-lang:mainfrom
folkertdev:stabilize-s390x-vector-registers

Conversation

@folkertdev
Copy link
Contributor

@folkertdev folkertdev commented Mar 21, 2026

tracking issue: #133416
reference PR: rust-lang/reference#2215

Stabilizes s390x vector registers, e.g.

unsafe fn vreg_128(x: i128) -> i128 {
    let y;
    asm!("vlr {}, {}", out(vreg) y, in(vreg) x);
    y
}

The types that are accepted for vreg registers are

  • all float types f16, f32, f64, f128
  • integer types i32, i64 and i128 and their unsigned counterparts
  • integer vector types i8x16, i16x8, i32x4, i64x2 and their unsigned counterparts
  • float vector types f16x8, f32x4 and f64x2

Support for all of these is tested in https://github.com/rust-lang/rust/blob/main/tests/assembly-llvm/asm/s390x-types.rs, and the types correspond with the LLVM definition in https://github.com/llvm/llvm-project/blob/df9eb79970c012990e829d174d181d575d414efe/llvm/lib/Target/SystemZ/SystemZRegisterInfo.td#L312-L339

The f16, f16x8 and f128 types are unstable, and so can't be used on stable in practice. They do show up in some error messages though.

vreg was previously only accepted as a clobber.


Currently the vector types in core::arch::s390x are still unstable. Separately stabilizing vreg is still useful because scalar types can also be put into vregs.

Implementation history

cc @uweigand @taiki-e
r? @Amanieu

@folkertdev folkertdev added the I-lang-nominated Nominated for discussion during a lang team meeting. label Mar 21, 2026
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 21, 2026
@rust-log-analyzer

This comment has been minimized.

@folkertdev folkertdev force-pushed the stabilize-s390x-vector-registers branch from 2986c81 to ba604f5 Compare March 21, 2026 18:30
@rust-log-analyzer

This comment has been minimized.

@folkertdev folkertdev force-pushed the stabilize-s390x-vector-registers branch from ba604f5 to e182fef Compare March 21, 2026 19:58
}
}
Self::vreg => types! {
vector: I32, F16, F32, I64, F64, I128, F128,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been looking at the z/Arch vector docs and there only seem to be instructions that load a full 128-bit into a vector register. I don't think it makes sense to expose smaller sizes here.

Specifically you only have the options of:

  • loading a full 128 bits from memory
  • loading a single element and replicating it over the 128 bits of the vector register
  • loading a single element and inserting it inside an existing vector

This is in contrast to AArch64 for example which does have instructions that load a single element and zero-extend it to the full size of the vector register.

As such I'm inclined to remove i32/i64/f16/f32/f64 from the set of allowed types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, that would deviate from what clang/llvm accept. I'll leave it up to the target maintainer(s) to argue one way or the other.

Strictly speaking the behavior as implemented is correct according to the reference:

https://doc.rust-lang.org/nightly/reference/inline-assembly.html?highlight=assemb#r-asm.operand-type.supported-operands.in

The allocated register will contain the value of <expr> at the start of the assembly code.

So that should be clarified if we remove support for the smaller scalar types.

Copy link
Member

@taiki-e taiki-e Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not make much sense as inputs, but since s390x has shuffle instructions, I think it could be meaningful as outputs.

e.g., by combining shuffle with normal vector addition, we can place the sum of all lanes in the lower lane.
(Some of _mm512_reduce_* in x86_64 are actually implemented in this way: https://godbolt.org/z/b5G6qKzns)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We absolutely need floating-point types in vector registers. VRs overlap the FPRs, and there's a whole set of scalar floating-point instructions operating on vector registers, using it basically as a larger FPR set. There's no need to extend anything, the (shorter) floating-point value sits at a defined location inside the VR, with the remaining lanes being ignored by the relevant instructions. The existing load/store operations work just fine for those. (There actually are VECTOR LOAD LOGICAL ELEMENT AND ZERO (VLLEZ) instructions that clear the other lanes as well, but it's generally not necessary to use them.)

The shorter integer types are less important, but would still be good to have, in particular for the (scalar) integer-to-float / float-to-integer conversion instructions that operate on VRs. (Again, those integer values would sit at a defined lane - normally the same as a floating-point value of the same size.)

All this is the same as GCC and LLVM handle inline asm with those types in VRs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I withdraw my objection. Would it make sense to add i8/i16 as well then for consistency? These are special cased on x86 because x86 doesn't have instructions that move an i8/i16 into a vector register.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

I-lang-nominated Nominated for discussion during a lang team meeting. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants