Skip to content

winch: Use aarch64 backend for code emission.#5652

Merged
cfallin merged 1 commit intobytecodealliance:mainfrom
saulecabrera:winch-codegen-use-aarch64-backend
Feb 2, 2023
Merged

winch: Use aarch64 backend for code emission.#5652
cfallin merged 1 commit intobytecodealliance:mainfrom
saulecabrera:winch-codegen-use-aarch64-backend

Conversation

@saulecabrera
Copy link
Member

This patch introduces basic Aarch64 code generation by using cranelift-codegen's backend.

This change does not:

  • Change the semantics of the code generation
  • Adds support for other Wasm instructions

The most notable change in this patch is how addressing modes are handled at the MacroAssembler layer: instead of having a canonical address representation, this patch introduces the addressing mode as an associated type in the MacroAssembler trait. This approach has the advantage that gives each ISA enough flexiblity to describe the addressing modes and their constraints in isolation without having to worry on how a particular addressing mode is going to affect other ISAs. In the case of Aarch64 this becomes useful to describe indexed addressing modes (particularly from the stack pointer).

This patch uses the concept of a shadow stack pointer (x28) as a workaround to Aarch64's stack pointer 16-byte alignment. This constraint is enforced by:

  • Introducing specialized addressing modes when using the real stack pointer; this enables auditing when the real stack pointer is used. As of this change, the real stack pointer is only used in the function's prologue and epilogue.

  • Asserting that the real stack pointer is not used as a base for addressing modes.

  • Ensuring that at any point during the code generation process where the stack pointer changes (e.g. when stack space is allocated / deallocated) the value of the real stack pointer is copied into the shadow stack pointer.

@saulecabrera saulecabrera requested a review from cfallin January 30, 2023 15:51
@saulecabrera saulecabrera force-pushed the winch-codegen-use-aarch64-backend branch from 94eb787 to 4bb0d45 Compare January 30, 2023 15:57
@github-actions github-actions bot added the winch Winch issues or pull requests label Jan 30, 2023
@github-actions
Copy link

Subscribe to Label Action

cc @saulecabrera

Details This issue or pull request has been labeled: "winch"

Thus the following users have been cc'd because of the following labels:

  • saulecabrera: winch

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@saulecabrera saulecabrera force-pushed the winch-codegen-use-aarch64-backend branch from 4bb0d45 to 07498f3 Compare January 31, 2023 01:42
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks great, thanks!

I talked to @saulecabrera and @KevinRizzoTO just now about the shadow-stack approach; it's got some very interesting and subtle tradeoffs. The one high-level concern right now is signal safety: leaving sp unaligned-to-16 is problematic if it means that a signal handled without a sigaltstack would push an unaligned frame (and quickly cause a bus-error crash). @saulecabrera is going to verify that this is actually an issue, then we discussed some alternate approaches (e.g.: keep x28, the shadow stack pointer, as the canonical allocation point and sync sp from it in a "lossy" way that rounds down to 16-aligned whenever we push).

@saulecabrera saulecabrera force-pushed the winch-codegen-use-aarch64-backend branch from 07498f3 to 3ef3f2f Compare February 1, 2023 16:54
This patch introduces basic aarch64 code generation by using
`cranelift-codegen`'s backend.

This commit *does not*:

* Change the semantics of the code generation
* Adds support for other Wasm instructions

The most notable change in this patch is how addressing modes are handled at the
MacroAssembler layer: instead of having a canonical address representation, this
patch introduces the addressing mode as an associated type in the
MacroAssembler trait. This approach has the advantage that gives each ISA enough
flexiblity to describe the addressing modes and their constraints in isolation
without having to worry on how a particular addressing mode is going to affect
other ISAs. In the case of Aarch64 this becomes useful to describe indexed
addressing modes (particularly from the stack pointer).

This patch uses the concept of a shadow stack pointer (x28) as a workaround to
Aarch64's stack pointer 16-byte alignment. This constraint is enforced by:

* Introducing specialized addressing modes when using the real stack pointer; this
enables auditing when the real stack pointer is used. As of this change, the
real stack pointer is only used in the function's prologue and epilogue.

* Asserting that the real stack pointer is not used as a base for addressing
modes.

* Ensuring that at any point during the code generation process where the stack
pointer changes (e.g. when stack space is allocated / deallocated) the value of
the real stack pointer is copied into the shadow stack pointer.
@saulecabrera saulecabrera force-pushed the winch-codegen-use-aarch64-backend branch from 3ef3f2f to 61e9a43 Compare February 1, 2023 17:01
@saulecabrera
Copy link
Member Author

saulecabrera commented Feb 1, 2023

@cfallin regarding signal safety -- I did a verification with the program below, using bl_signal which as far as I can tell doesn't set up an alternative stack.

Signal handler
.global _start
.align 2

_start:
  stp x29, x30, [sp, #-16]!

  ;; Signal number, sigint in this case
  mov x0, #2 

  ;; Load address of the handler
  adr x1, _signal_handler

  ;; Call bsd_signal to register the signal handler
  bl _bsd_signal

  ;; Unalign the sp;
  ;; simulate making space for word-wise pushes
  sub sp, sp, #8
  sub sp, sp, #8
  sub sp, sp, #8

;; Infinite loop;
;; simulate waiting
   loop: mov x0, #1  ;; Descriptor
   adr x1, _looping   ;; Message address
   mov x2, #5           ;; Length
   mov x16, #4         ;; Write
   svc #0                  ;; Call service
   b loop

;; Commented out since it will never make it here
;; ldp x29, x30, [sp], #16
;; ret

_signal_handler:
  stp x29, x30, [sp, #-16]!
  mov x0, #1             ;; Descriptor
  adr x1, _message  ;; Message address
  mov x2, #15           ;; Length
  mov x16, #4           ;; Write
  svc #0                    ;;  Call service

  ldp x29, x30, [sp], #16

  ;; Terminate the program (for demonstration purposes)
  ;; and to quit the infinite loop

  mov x0, #0      ;; Return code
  mov x16, #1     ;; Service code
  svc #0              ;; Call to terminate 

_message: .ascii "Signal handled\n"
_looping: .ascii "Loop\n"

and the results confirm the theory that SP is correctly aligned when entering the signal handler's frame. The program above runs successfully even though the stack pointer is not aligned to 16 before entering the loop. If the signal handler's code is changed to use an unaligned SP to address memory (e.g. stp x29, x30, [sp, #-24]!) we get a bus error crash as expected.

@saulecabrera saulecabrera requested a review from cfallin February 2, 2023 12:24
@cfallin
Copy link
Member

cfallin commented Feb 2, 2023

Ah, that's a really interesting outcome, thanks! Was this on macOS/aarch64 or Linux/aarch64? (I guess we'd want to make sure it's properly handled on both?)

@saulecabrera
Copy link
Member Author

Ah, that's a really interesting outcome, thanks! Was this on macOS/aarch64 or Linux/aarch64? (I guess we'd want to make sure it's properly handled on both?)

Yeah, tested in both.

  • MacOS Ventura 13.2: the linked example in my previous comment is the one I used in Mac.
  • For linux: I used Ubuntu 20.04.5. I tweaked the program above to use the right params for syscalls.

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM then -- we can always improve the stack strategy later but it sounds like the unaligned-sp approach should actually work here (as long as we realign at callsites). Thanks!

@cfallin cfallin merged commit 426c49b into bytecodealliance:main Feb 2, 2023
@saulecabrera saulecabrera deleted the winch-codegen-use-aarch64-backend branch February 2, 2023 22:25
saulecabrera added a commit to saulecabrera/wasmtime that referenced this pull request Jan 29, 2025
This commit marks another step toward finalizing AArch64 support in
Winch.

While enabling spec tests, I experienced some unexpected failures
related to Wasm loads/stores and traps. The observed
symptoms are as follows:

* Under normal conditions, Wasm loads/stores work as expected.
* In out-of-bounds scenarios, loads/stores result in a segmentation
  fault, whereas the expected behavior is to trigger an out-of-bounds trap.
* When out-of-bounds access can be determined statically, the program
  still results in a segmentation fault instead of the anticipated
  out-of-bounds trap.

Debugging revealed the following issues:

* The stack pointer was not correctly aligned to 16 bytes when entering
  signal handlers, which caused the segmentation fault.
* Wasm loads and stores were not flagged as untrusted, leading to
  segmentation faults even when the stack pointer was properly aligned.

This commit fixes the previous issues by:

* Correctly flagging wasm loads and stores as untrusted.
* Reworking the shadow stack pointer approach such that it allows
  aligning the stack pointer at arbitrary points in the program,
  particularly where signal handling might be needed. This rework
  involves changing some principles introduced in
  bytecodealliance#5652; namely:
  changing the primary stack pointer register to be the shadow stack
  pointer. See the updates comments in the code for more details.

Note that this change doesn't enable spectests. To try this change, run:

  cargo run -- wast -Ccompiler=winch tests/spec_testsuite/address.wast
saulecabrera added a commit to saulecabrera/wasmtime that referenced this pull request Jan 29, 2025
This commit marks another step toward finalizing AArch64 support in
Winch.

While enabling spec tests, I experienced some unexpected failures
related to Wasm loads/stores and traps. The observed
symptoms are as follows:

* Under normal conditions, Wasm loads/stores work as expected.
* In out-of-bounds scenarios, loads/stores result in a segmentation
  fault, whereas the expected behavior is to trigger an out-of-bounds trap.
* When out-of-bounds access can be determined statically, the program
  still results in a segmentation fault instead of the anticipated
  out-of-bounds trap.

Debugging revealed the following issues:

* The stack pointer was not correctly aligned to 16 bytes when entering
  signal handlers, which caused the segmentation fault.
* Wasm loads and stores were not flagged as untrusted, leading to
  segmentation faults even when the stack pointer was properly aligned.

This commit fixes the previous issues by:

* Correctly flagging wasm loads and stores as untrusted.
* Reworking the shadow stack pointer approach such that it allows
  aligning the stack pointer at arbitrary points in the program,
  particularly where signal handling might be needed. This rework
  involves changing some principles introduced in
  bytecodealliance#5652; namely:
  changing the primary stack pointer register to be the shadow stack
  pointer. See the updates comments in the code for more details.

Note that this change doesn't enable spectests. To try this change, run:

  cargo run -- wast -Ccompiler=winch tests/spec_testsuite/address.wast
github-merge-queue bot pushed a commit that referenced this pull request Jan 30, 2025
This commit marks another step toward finalizing AArch64 support in
Winch.

While enabling spec tests, I experienced some unexpected failures
related to Wasm loads/stores and traps. The observed
symptoms are as follows:

* Under normal conditions, Wasm loads/stores work as expected.
* In out-of-bounds scenarios, loads/stores result in a segmentation
  fault, whereas the expected behavior is to trigger an out-of-bounds trap.
* When out-of-bounds access can be determined statically, the program
  still results in a segmentation fault instead of the anticipated
  out-of-bounds trap.

Debugging revealed the following issues:

* The stack pointer was not correctly aligned to 16 bytes when entering
  signal handlers, which caused the segmentation fault.
* Wasm loads and stores were not flagged as untrusted, leading to
  segmentation faults even when the stack pointer was properly aligned.

This commit fixes the previous issues by:

* Correctly flagging wasm loads and stores as untrusted.
* Reworking the shadow stack pointer approach such that it allows
  aligning the stack pointer at arbitrary points in the program,
  particularly where signal handling might be needed. This rework
  involves changing some principles introduced in
  #5652; namely:
  changing the primary stack pointer register to be the shadow stack
  pointer. See the updates comments in the code for more details.

Note that this change doesn't enable spectests. To try this change, run:

  cargo run -- wast -Ccompiler=winch tests/spec_testsuite/address.wast
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

winch Winch issues or pull requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants