asm: SSE ALU Instructions by rahulchaphalkar · Pull Request #10316 · bytecodealliance/wasmtime

rahulchaphalkar · 2025-03-01T19:46:08Z

Implement the following SSE instructions:

xorpd
xorps
andpd
andps
addpd
addps
subpd
subps
orps

The opcode for *ps variants of these instructions is NP 0F xx. NP has the following explanation in manual:

NP — Indicates the use of 66/F2/F3 prefixes (beyond those already part of the instructions opcode) are not
allowed with the instruction. Such use will either cause an invalid-opcode exception (#UD) or result in the
encoding for a different instruction.

There is an existing NoPrefix prefix, but I have not reused that to serve above NP as there might be some difference in the intended meaning. Also, NP doesn't classify as a prefix technically. I am thinking to address this in a later patch which would deal with prefix parsing. Any thoughts or feedback on this is welcome.
My thinking is NP doesn't functionally do anything, so it is fine to ignore it. We just need a way for validate function to check this.

implement sse xorpd, addpd, subpd, andpd implement addps, andps, subps

abrown

Had a couple of code movement comments but overall this all makes sense to me. Thanks!

cranelift/codegen/src/isa/x64/inst.isle

cranelift/assembler-x64/meta/src/instructions/add.rs

In [bytecodealliance#10408], the new assembler re-opened an old issue related to unaligned loads with SSE instructions. SSE instructions expect 128-bit aligned loads when using the `m128` operand and fault if that is not the case. This had been fixed previously by disallowing load-sinking for `XmmMem` ([bytecodealliance#4891]) but more recently we had adopted the use of `XmmMemAligned` in `cranelift-codegen`. Since [bytecodealliance#10316] had no knowledge of `XmmMemAligned` (only `XmmMem`), it caused the same kind fault--an OOB trap that was in fact an unaligned load. Why didn't CI catch this? Since all the CI machines have AVX and we do not explicitly test the SSE-only case, these unaligned, sunk loads would use the AVX lowering in CI. AVX loads handle unaligned accesses without a fault. This was only discovered during fuzzing when AVX was disabled (i.e., `--target x86_64-unknown-linux-gnu`). To fix this, this change adopts the `XmmMemAligned` type in the generated assembler code. This is temporary, though: a more lasting fix should pass along an "alignment required" bit from the assembler AST. In the meantime, this closes bytecodealliance#10408. [bytecodealliance#10408]: bytecodealliance#10408 [bytecodealliance#4891]: bytecodealliance#4891 [bytecodealliance#10316]: bytecodealliance#10316

…10417) In [#10408], the new assembler re-opened an old issue related to unaligned loads with SSE instructions. SSE instructions expect 128-bit aligned loads when using the `m128` operand and fault if that is not the case. This had been fixed previously by disallowing load-sinking for `XmmMem` ([#4891]) but more recently we had adopted the use of `XmmMemAligned` in `cranelift-codegen`. Since [#10316] had no knowledge of `XmmMemAligned` (only `XmmMem`), it caused the same kind fault--an OOB trap that was in fact an unaligned load. Why didn't CI catch this? Since all the CI machines have AVX and we do not explicitly test the SSE-only case, these unaligned, sunk loads would use the AVX lowering in CI. AVX loads handle unaligned accesses without a fault. This was only discovered during fuzzing when AVX was disabled (i.e., `--target x86_64-unknown-linux-gnu`). To fix this, this change adopts the `XmmMemAligned` type in the generated assembler code. This is temporary, though: a more lasting fix should pass along an "alignment required" bit from the assembler AST. In the meantime, this closes #10408. [#10408]: #10408 [#4891]: #4891 [#10316]: #10316

rahulchaphalkar added 3 commits March 1, 2025 11:20

implement andpd sse instruction

e8831ec

implement sse xorpd, addpd, subpd, andpd implement addps, andps, subps

impl xorps

9990896

orps instr

5c94431

rahulchaphalkar requested a review from a team as a code owner March 1, 2025 19:46

rahulchaphalkar requested review from cfallin and removed request for a team March 1, 2025 19:46

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Mar 1, 2025

bless clif tests

37fae1b

abrown requested review from abrown and removed request for cfallin March 3, 2025 19:28

abrown reviewed Mar 5, 2025

View reviewed changes

cranelift/codegen/src/isa/x64/inst.isle Outdated Show resolved Hide resolved

cranelift/codegen/src/isa/x64/inst.isle Outdated Show resolved Hide resolved

cranelift/assembler-x64/meta/src/instructions/add.rs Outdated Show resolved Hide resolved

rahulchaphalkar added 2 commits March 6, 2025 11:34

remove old isle emission

1ca401d

reorder instructions in ascending order of size, added comment

ea496dc

abrown approved these changes Mar 6, 2025

View reviewed changes

abrown added this pull request to the merge queue Mar 6, 2025

Merged via the queue into bytecodealliance:main with commit ddaaed6 Mar 6, 2025
51 checks passed

alexcrichton mentioned this pull request Mar 17, 2025

x64: Misaligned load fault with sunk float operation when AVX disabled #10408

Closed

abrown mentioned this pull request Mar 17, 2025

x64: fix misaligned load fault with sunk load when AVX is disabled #10417

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

asm: SSE ALU Instructions#10316

asm: SSE ALU Instructions#10316
abrown merged 6 commits intobytecodealliance:mainfrom
rahulchaphalkar:sse-alu-instructions

rahulchaphalkar commented Mar 1, 2025

Uh oh!

abrown left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rahulchaphalkar commented Mar 1, 2025

Uh oh!

abrown left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants