Eagerly reserve tail call stack arg space by elliottt · Pull Request #8327 · bytecodealliance/wasmtime

elliottt · 2024-04-11T00:08:25Z

return_call instructions reuse the incoming argument area of the caller's frame. As such if the caller's incoming argument area is not exactly the right size for the callee, some resizing will need to take place to ensure that the return address, frame pointer, clobbers, and stack slots don't get overwritten. The current solution on the main branch for the x64 backend is to explicitly resize the frame via ShrinkArgumentArea or GrowArgumentArea right before the return_call arguments are written to the stack, ensuring that there is sufficient space. While this does work, it does make a return_call more expensive when the resizing is necessary.

To simplify this, we instead resize the incoming argument area in the function prologue to accommodate the largest possible argument area of any return_call instruction in the function. We then shrink back down when necessary before an individual return_call. This simplifies the implementation of tail calls on x86_64, as we no longer need to move the entire frame, just the return address before we jump to the tail-callee.

Co-authored-by: Jamey Sharp jsharp@fastly.com

cranelift/codegen/src/isa/x64/inst/args.rs

elliottt · 2024-04-11T22:58:02Z

cranelift/codegen/src/isa/x64/abi.rs


+        // When a return_call within this function required more stack arguments than we have
+        // present, resize the incoming argument area of the frame to accommodate those arguments.
+        let incoming_args_diff = frame_layout.tail_args_size - frame_layout.incoming_args_size;


We may need to add this to offset_upward_to_caller_sp in the unwind info below.

fitzgen

Nice!

fitzgen · 2024-04-11T22:51:51Z

cranelift/codegen/src/isa/x64/inst.isle

-       ;; GrowArgumentArea does a memmove of everything in the frame except for
-       ;; the argument area, to make room for more arguments. That includes all
-       ;; the stack slots, the callee-saved registers, and the saved FP and
-       ;; return address. To keep the stack pointers in sync with that change,
-       ;; it also subtracts the given amount from both the FP and SP registers.
-       (GrowArgumentArea (amount u32)
-                         (tmp WritableGpr))
-
-       ;; ShrinkArgumentArea does a memmove of everything in the frame except
-       ;; for the argument area, to trim space for fewer arguments. That
-       ;; includes all the stack slots, the callee-saved registers, and the
-       ;; saved FP and return address. To keep the stack pointers in sync with
-       ;; that change, it also adds the given amount to both the FP and SP
-       ;; registers.
-       (ShrinkArgumentArea (amount u32)
-                           (tmp WritableGpr))


fitzgen · 2024-04-11T23:02:28Z

cranelift/filetests/filetests/isa/x64/return-call-indirect.clif

 ;   subq    %rsp, $160, %rsp
+;   movq    %rsp, %rbp
+;   movq    160(%rsp), %r11
+;   movq    %r11, 0(%rsp)
+;   movq    168(%rsp), %r11
+;   movq    %r11, 8(%rsp)
+;   subq    %rsp, $160, %rsp


Definitely not a blocker for landing this PR, but it would be nice if we could get this to be a single sub $rsp, M+N followed by mov temp, [$rsp + OLD_RA_OFFSET]; mov [$rsp + NEW_RA_OFFSET] and similar for the saved FP, instead of an initial sub to establish the frame followed by a second sub for tail call arguments space.

We're experimenting with that on a different branch based on this PR, as a follow-up. Moving the stack check before the frame setup means that we can cut out even more instructions, as there's no need to move the frame pointer as well.

…logue `return_call` instructions reuse the incoming argument area of the caller's frame. As such if the caller's incoming argument area is not exactly the right size for the callee, some resizing will need to take place to ensure that the return address, frame pointer, clobbers, and stack slots don't get overwritten. The current solution on the main branch for the x64 backend is to explicitly resize the frame via `ShrinkArgumentArea` or `GrowArgumentArea` right before the `return_call` arguments are written to the stack, ensuring that there is sufficient space. While this does work, it does make a `return_call` more expensive when the resizing is necessary. To simplify this, we instead resize the incoming argument area in the function prologue to accommodate the largest possible argument area of any `return_call` instruction in the function. We then shrink back down when necessary before an individual `return_call`. This simplifies the implementation of tail calls on x86_64, as we no longer need to move the entire frame, just the return address before we jump to the tail-callee. Co-authored-by: Jamey Sharp <jsharp@fastly.com>

This reverts the key parts of e3a08d4 (bytecodealliance#8151), because it turns out that we didn't need that abstraction. Several changes in the last month have enabled this: - bytecodealliance#8292 and then bytecodealliance#8316 allow us to refer to either incoming or outgoing argument areas in a (mostly) consistent way - bytecodealliance#8327, bytecodealliance#8377, and bytecodealliance#8383 demonstrate that we never need to delay writing stack arguments directly to their final location

This reverts the key parts of e3a08d4 (bytecodealliance#8151), because it turns out that we didn't need that abstraction. Several changes in the last month have enabled this: - bytecodealliance#8292 and then bytecodealliance#8316 allow us to refer to either incoming or outgoing argument areas in a (mostly) consistent way - bytecodealliance#8327, bytecodealliance#8377, and bytecodealliance#8383 demonstrate that we never need to delay writing stack arguments directly to their final location prtest:full

This reverts the key parts of e3a08d4 (#8151), because it turns out that we didn't need that abstraction. Several changes in the last month have enabled this: - #8292 and then #8316 allow us to refer to either incoming or outgoing argument areas in a (mostly) consistent way - #8327, #8377, and #8383 demonstrate that we never need to delay writing stack arguments directly to their final location prtest:full

jameysharp mentioned this pull request Apr 11, 2024

Always reserve space for outgoing args #8319

Merged

jameysharp reviewed Apr 11, 2024

View reviewed changes

cranelift/codegen/src/isa/x64/inst/args.rs Outdated Show resolved Hide resolved

elliottt force-pushed the trevor/eagerly-reserve-tail-call-arg-space branch 4 times, most recently from dc65e3f to 3f2be4d Compare April 11, 2024 19:59

jameysharp marked this pull request as ready for review April 11, 2024 20:50

jameysharp requested a review from a team as a code owner April 11, 2024 20:50

jameysharp requested review from abrown and removed request for a team April 11, 2024 20:50

elliottt requested review from fitzgen and removed request for abrown April 11, 2024 21:32

elliottt force-pushed the trevor/eagerly-reserve-tail-call-arg-space branch 2 times, most recently from 0ae41a6 to 3e65778 Compare April 11, 2024 21:55

elliottt commented Apr 11, 2024

View reviewed changes

fitzgen approved these changes Apr 11, 2024

View reviewed changes

elliottt force-pushed the trevor/eagerly-reserve-tail-call-arg-space branch from bd53c0a to f82767b Compare April 15, 2024 18:47

elliottt force-pushed the trevor/eagerly-reserve-tail-call-arg-space branch from f82767b to 6b39b19 Compare April 15, 2024 18:49

elliottt added this pull request to the merge queue Apr 15, 2024

Merged via the queue into bytecodealliance:main with commit bbd3c54 Apr 15, 2024

elliottt deleted the trevor/eagerly-reserve-tail-call-arg-space branch April 15, 2024 20:25

jameysharp mentioned this pull request Apr 18, 2024

cranelift: Emit argument location uses eagerly in gen_arg #8398

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eagerly reserve tail call stack arg space#8327

Eagerly reserve tail call stack arg space#8327
elliottt merged 1 commit intobytecodealliance:mainfrom
elliottt:trevor/eagerly-reserve-tail-call-arg-space

elliottt commented Apr 11, 2024 •

edited

Loading

Uh oh!

Uh oh!

elliottt Apr 11, 2024

Uh oh!

fitzgen left a comment

Uh oh!

fitzgen Apr 11, 2024

Uh oh!

fitzgen Apr 11, 2024

Uh oh!

elliottt Apr 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

elliottt commented Apr 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

elliottt Apr 11, 2024

Choose a reason for hiding this comment

Uh oh!

fitzgen left a comment

Choose a reason for hiding this comment

Uh oh!

fitzgen Apr 11, 2024

Choose a reason for hiding this comment

Uh oh!

fitzgen Apr 11, 2024

Choose a reason for hiding this comment

Uh oh!

elliottt Apr 11, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

elliottt commented Apr 11, 2024 •

edited

Loading