Skip to content

asm: SSE ALU Instructions#10316

Merged
abrown merged 6 commits intobytecodealliance:mainfrom
rahulchaphalkar:sse-alu-instructions
Mar 6, 2025
Merged

asm: SSE ALU Instructions#10316
abrown merged 6 commits intobytecodealliance:mainfrom
rahulchaphalkar:sse-alu-instructions

Conversation

@rahulchaphalkar
Copy link
Contributor

Implement the following SSE instructions:

  • xorpd
  • xorps
  • andpd
  • andps
  • addpd
  • addps
  • subpd
  • subps
  • orps

The opcode for *ps variants of these instructions is NP 0F xx. NP has the following explanation in manual:

NP — Indicates the use of 66/F2/F3 prefixes (beyond those already part of the instructions opcode) are not
allowed with the instruction. Such use will either cause an invalid-opcode exception (#UD) or result in the
encoding for a different instruction.

There is an existing NoPrefix prefix, but I have not reused that to serve above NP as there might be some difference in the intended meaning. Also, NP doesn't classify as a prefix technically. I am thinking to address this in a later patch which would deal with prefix parsing. Any thoughts or feedback on this is welcome.
My thinking is NP doesn't functionally do anything, so it is fine to ignore it. We just need a way for validate function to check this.

implement sse xorpd, addpd, subpd, andpd

implement addps, andps, subps
@rahulchaphalkar rahulchaphalkar requested a review from a team as a code owner March 1, 2025 19:46
@rahulchaphalkar rahulchaphalkar requested review from cfallin and removed request for a team March 1, 2025 19:46
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Mar 1, 2025
@abrown abrown requested review from abrown and removed request for cfallin March 3, 2025 19:28
Copy link
Member

@abrown abrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a couple of code movement comments but overall this all makes sense to me. Thanks!

@abrown abrown added this pull request to the merge queue Mar 6, 2025
Merged via the queue into bytecodealliance:main with commit ddaaed6 Mar 6, 2025
51 checks passed
abrown added a commit to abrown/wasmtime that referenced this pull request Mar 17, 2025
In [bytecodealliance#10408], the new assembler re-opened an old issue related to
unaligned loads with SSE instructions. SSE instructions expect 128-bit
aligned loads when using the `m128` operand and fault if that is not the
case. This had been fixed previously by disallowing load-sinking for
`XmmMem` ([bytecodealliance#4891]) but more recently we had adopted the use of
`XmmMemAligned` in `cranelift-codegen`. Since [bytecodealliance#10316] had no knowledge
of `XmmMemAligned` (only `XmmMem`), it caused the same kind fault--an
OOB trap that was in fact an unaligned load.

Why didn't CI catch this? Since all the CI machines have AVX and we do
not explicitly test the SSE-only case, these unaligned, sunk loads would
use the AVX lowering in CI. AVX loads handle unaligned accesses without
a fault. This was only discovered during fuzzing when AVX was disabled
(i.e., `--target x86_64-unknown-linux-gnu`).

To fix this, this change adopts the `XmmMemAligned` type in the
generated assembler code. This is temporary, though: a more lasting fix
should pass along an "alignment required" bit from the assembler AST.
In the meantime, this closes bytecodealliance#10408.

[bytecodealliance#10408]: bytecodealliance#10408
[bytecodealliance#4891]: bytecodealliance#4891
[bytecodealliance#10316]: bytecodealliance#10316
github-merge-queue bot pushed a commit that referenced this pull request Mar 18, 2025
…10417)

In [#10408], the new assembler re-opened an old issue related to
unaligned loads with SSE instructions. SSE instructions expect 128-bit
aligned loads when using the `m128` operand and fault if that is not the
case. This had been fixed previously by disallowing load-sinking for
`XmmMem` ([#4891]) but more recently we had adopted the use of
`XmmMemAligned` in `cranelift-codegen`. Since [#10316] had no knowledge
of `XmmMemAligned` (only `XmmMem`), it caused the same kind fault--an
OOB trap that was in fact an unaligned load.

Why didn't CI catch this? Since all the CI machines have AVX and we do
not explicitly test the SSE-only case, these unaligned, sunk loads would
use the AVX lowering in CI. AVX loads handle unaligned accesses without
a fault. This was only discovered during fuzzing when AVX was disabled
(i.e., `--target x86_64-unknown-linux-gnu`).

To fix this, this change adopts the `XmmMemAligned` type in the
generated assembler code. This is temporary, though: a more lasting fix
should pass along an "alignment required" bit from the assembler AST.
In the meantime, this closes #10408.

[#10408]: #10408
[#4891]: #4891
[#10316]: #10316
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants