winch: Use aarch64 backend for code emission.#5652
winch: Use aarch64 backend for code emission.#5652cfallin merged 1 commit intobytecodealliance:mainfrom
Conversation
94eb787 to
4bb0d45
Compare
Subscribe to Label ActionDetailsThis issue or pull request has been labeled: "winch"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
4bb0d45 to
07498f3
Compare
cfallin
left a comment
There was a problem hiding this comment.
This generally looks great, thanks!
I talked to @saulecabrera and @KevinRizzoTO just now about the shadow-stack approach; it's got some very interesting and subtle tradeoffs. The one high-level concern right now is signal safety: leaving sp unaligned-to-16 is problematic if it means that a signal handled without a sigaltstack would push an unaligned frame (and quickly cause a bus-error crash). @saulecabrera is going to verify that this is actually an issue, then we discussed some alternate approaches (e.g.: keep x28, the shadow stack pointer, as the canonical allocation point and sync sp from it in a "lossy" way that rounds down to 16-aligned whenever we push).
07498f3 to
3ef3f2f
Compare
This patch introduces basic aarch64 code generation by using `cranelift-codegen`'s backend. This commit *does not*: * Change the semantics of the code generation * Adds support for other Wasm instructions The most notable change in this patch is how addressing modes are handled at the MacroAssembler layer: instead of having a canonical address representation, this patch introduces the addressing mode as an associated type in the MacroAssembler trait. This approach has the advantage that gives each ISA enough flexiblity to describe the addressing modes and their constraints in isolation without having to worry on how a particular addressing mode is going to affect other ISAs. In the case of Aarch64 this becomes useful to describe indexed addressing modes (particularly from the stack pointer). This patch uses the concept of a shadow stack pointer (x28) as a workaround to Aarch64's stack pointer 16-byte alignment. This constraint is enforced by: * Introducing specialized addressing modes when using the real stack pointer; this enables auditing when the real stack pointer is used. As of this change, the real stack pointer is only used in the function's prologue and epilogue. * Asserting that the real stack pointer is not used as a base for addressing modes. * Ensuring that at any point during the code generation process where the stack pointer changes (e.g. when stack space is allocated / deallocated) the value of the real stack pointer is copied into the shadow stack pointer.
3ef3f2f to
61e9a43
Compare
|
@cfallin regarding signal safety -- I did a verification with the program below, using Signal handler.global _start
.align 2
_start:
stp x29, x30, [sp, #-16]!
;; Signal number, sigint in this case
mov x0, #2
;; Load address of the handler
adr x1, _signal_handler
;; Call bsd_signal to register the signal handler
bl _bsd_signal
;; Unalign the sp;
;; simulate making space for word-wise pushes
sub sp, sp, #8
sub sp, sp, #8
sub sp, sp, #8
;; Infinite loop;
;; simulate waiting
loop: mov x0, #1 ;; Descriptor
adr x1, _looping ;; Message address
mov x2, #5 ;; Length
mov x16, #4 ;; Write
svc #0 ;; Call service
b loop
;; Commented out since it will never make it here
;; ldp x29, x30, [sp], #16
;; ret
_signal_handler:
stp x29, x30, [sp, #-16]!
mov x0, #1 ;; Descriptor
adr x1, _message ;; Message address
mov x2, #15 ;; Length
mov x16, #4 ;; Write
svc #0 ;; Call service
ldp x29, x30, [sp], #16
;; Terminate the program (for demonstration purposes)
;; and to quit the infinite loop
mov x0, #0 ;; Return code
mov x16, #1 ;; Service code
svc #0 ;; Call to terminate
_message: .ascii "Signal handled\n"
_looping: .ascii "Loop\n"and the results confirm the theory that SP is correctly aligned when entering the signal handler's frame. The program above runs successfully even though the stack pointer is not aligned to 16 before entering the loop. If the signal handler's code is changed to use an unaligned SP to address memory (e.g. |
|
Ah, that's a really interesting outcome, thanks! Was this on macOS/aarch64 or Linux/aarch64? (I guess we'd want to make sure it's properly handled on both?) |
Yeah, tested in both.
|
cfallin
left a comment
There was a problem hiding this comment.
LGTM then -- we can always improve the stack strategy later but it sounds like the unaligned-sp approach should actually work here (as long as we realign at callsites). Thanks!
This commit marks another step toward finalizing AArch64 support in Winch. While enabling spec tests, I experienced some unexpected failures related to Wasm loads/stores and traps. The observed symptoms are as follows: * Under normal conditions, Wasm loads/stores work as expected. * In out-of-bounds scenarios, loads/stores result in a segmentation fault, whereas the expected behavior is to trigger an out-of-bounds trap. * When out-of-bounds access can be determined statically, the program still results in a segmentation fault instead of the anticipated out-of-bounds trap. Debugging revealed the following issues: * The stack pointer was not correctly aligned to 16 bytes when entering signal handlers, which caused the segmentation fault. * Wasm loads and stores were not flagged as untrusted, leading to segmentation faults even when the stack pointer was properly aligned. This commit fixes the previous issues by: * Correctly flagging wasm loads and stores as untrusted. * Reworking the shadow stack pointer approach such that it allows aligning the stack pointer at arbitrary points in the program, particularly where signal handling might be needed. This rework involves changing some principles introduced in bytecodealliance#5652; namely: changing the primary stack pointer register to be the shadow stack pointer. See the updates comments in the code for more details. Note that this change doesn't enable spectests. To try this change, run: cargo run -- wast -Ccompiler=winch tests/spec_testsuite/address.wast
This commit marks another step toward finalizing AArch64 support in Winch. While enabling spec tests, I experienced some unexpected failures related to Wasm loads/stores and traps. The observed symptoms are as follows: * Under normal conditions, Wasm loads/stores work as expected. * In out-of-bounds scenarios, loads/stores result in a segmentation fault, whereas the expected behavior is to trigger an out-of-bounds trap. * When out-of-bounds access can be determined statically, the program still results in a segmentation fault instead of the anticipated out-of-bounds trap. Debugging revealed the following issues: * The stack pointer was not correctly aligned to 16 bytes when entering signal handlers, which caused the segmentation fault. * Wasm loads and stores were not flagged as untrusted, leading to segmentation faults even when the stack pointer was properly aligned. This commit fixes the previous issues by: * Correctly flagging wasm loads and stores as untrusted. * Reworking the shadow stack pointer approach such that it allows aligning the stack pointer at arbitrary points in the program, particularly where signal handling might be needed. This rework involves changing some principles introduced in bytecodealliance#5652; namely: changing the primary stack pointer register to be the shadow stack pointer. See the updates comments in the code for more details. Note that this change doesn't enable spectests. To try this change, run: cargo run -- wast -Ccompiler=winch tests/spec_testsuite/address.wast
This commit marks another step toward finalizing AArch64 support in Winch. While enabling spec tests, I experienced some unexpected failures related to Wasm loads/stores and traps. The observed symptoms are as follows: * Under normal conditions, Wasm loads/stores work as expected. * In out-of-bounds scenarios, loads/stores result in a segmentation fault, whereas the expected behavior is to trigger an out-of-bounds trap. * When out-of-bounds access can be determined statically, the program still results in a segmentation fault instead of the anticipated out-of-bounds trap. Debugging revealed the following issues: * The stack pointer was not correctly aligned to 16 bytes when entering signal handlers, which caused the segmentation fault. * Wasm loads and stores were not flagged as untrusted, leading to segmentation faults even when the stack pointer was properly aligned. This commit fixes the previous issues by: * Correctly flagging wasm loads and stores as untrusted. * Reworking the shadow stack pointer approach such that it allows aligning the stack pointer at arbitrary points in the program, particularly where signal handling might be needed. This rework involves changing some principles introduced in #5652; namely: changing the primary stack pointer register to be the shadow stack pointer. See the updates comments in the code for more details. Note that this change doesn't enable spectests. To try this change, run: cargo run -- wast -Ccompiler=winch tests/spec_testsuite/address.wast
This patch introduces basic Aarch64 code generation by using
cranelift-codegen's backend.This change does not:
The most notable change in this patch is how addressing modes are handled at the MacroAssembler layer: instead of having a canonical address representation, this patch introduces the addressing mode as an associated type in the MacroAssembler trait. This approach has the advantage that gives each ISA enough flexiblity to describe the addressing modes and their constraints in isolation without having to worry on how a particular addressing mode is going to affect other ISAs. In the case of Aarch64 this becomes useful to describe indexed addressing modes (particularly from the stack pointer).
This patch uses the concept of a shadow stack pointer (x28) as a workaround to Aarch64's stack pointer 16-byte alignment. This constraint is enforced by:
Introducing specialized addressing modes when using the real stack pointer; this enables auditing when the real stack pointer is used. As of this change, the real stack pointer is only used in the function's prologue and epilogue.
Asserting that the real stack pointer is not used as a base for addressing modes.
Ensuring that at any point during the code generation process where the stack pointer changes (e.g. when stack space is allocated / deallocated) the value of the real stack pointer is copied into the shadow stack pointer.