Skip to content

Eagerly reserve tail call stack arg space#8327

Merged
elliottt merged 1 commit intobytecodealliance:mainfrom
elliottt:trevor/eagerly-reserve-tail-call-arg-space
Apr 15, 2024
Merged

Eagerly reserve tail call stack arg space#8327
elliottt merged 1 commit intobytecodealliance:mainfrom
elliottt:trevor/eagerly-reserve-tail-call-arg-space

Conversation

@elliottt
Copy link
Member

@elliottt elliottt commented Apr 11, 2024

return_call instructions reuse the incoming argument area of the caller's frame. As such if the caller's incoming argument area is not exactly the right size for the callee, some resizing will need to take place to ensure that the return address, frame pointer, clobbers, and stack slots don't get overwritten. The current solution on the main branch for the x64 backend is to explicitly resize the frame via ShrinkArgumentArea or GrowArgumentArea right before the return_call arguments are written to the stack, ensuring that there is sufficient space. While this does work, it does make a return_call more expensive when the resizing is necessary.

To simplify this, we instead resize the incoming argument area in the function prologue to accommodate the largest possible argument area of any return_call instruction in the function. We then shrink back down when necessary before an individual return_call. This simplifies the implementation of tail calls on x86_64, as we no longer need to move the entire frame, just the return address before we jump to the tail-callee.

Co-authored-by: Jamey Sharp jsharp@fastly.com

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen labels Apr 11, 2024
@elliottt elliottt force-pushed the trevor/eagerly-reserve-tail-call-arg-space branch 4 times, most recently from dc65e3f to 3f2be4d Compare April 11, 2024 19:59
@jameysharp jameysharp marked this pull request as ready for review April 11, 2024 20:50
@jameysharp jameysharp requested a review from a team as a code owner April 11, 2024 20:50
@jameysharp jameysharp requested review from abrown and removed request for a team April 11, 2024 20:50
@elliottt elliottt requested review from fitzgen and removed request for abrown April 11, 2024 21:32
@elliottt elliottt force-pushed the trevor/eagerly-reserve-tail-call-arg-space branch 2 times, most recently from 0ae41a6 to 3e65778 Compare April 11, 2024 21:55

// When a return_call within this function required more stack arguments than we have
// present, resize the incoming argument area of the frame to accommodate those arguments.
let incoming_args_diff = frame_layout.tail_args_size - frame_layout.incoming_args_size;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to add this to offset_upward_to_caller_sp in the unwind info below.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines -544 to -559
;; GrowArgumentArea does a memmove of everything in the frame except for
;; the argument area, to make room for more arguments. That includes all
;; the stack slots, the callee-saved registers, and the saved FP and
;; return address. To keep the stack pointers in sync with that change,
;; it also subtracts the given amount from both the FP and SP registers.
(GrowArgumentArea (amount u32)
(tmp WritableGpr))

;; ShrinkArgumentArea does a memmove of everything in the frame except
;; for the argument area, to trim space for fewer arguments. That
;; includes all the stack slots, the callee-saved registers, and the
;; saved FP and return address. To keep the stack pointers in sync with
;; that change, it also adds the given amount to both the FP and SP
;; registers.
(ShrinkArgumentArea (amount u32)
(tmp WritableGpr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Comment on lines 250 to +256
; subq %rsp, $160, %rsp
; movq %rsp, %rbp
; movq 160(%rsp), %r11
; movq %r11, 0(%rsp)
; movq 168(%rsp), %r11
; movq %r11, 8(%rsp)
; subq %rsp, $160, %rsp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not a blocker for landing this PR, but it would be nice if we could get this to be a single sub $rsp, M+N followed by mov temp, [$rsp + OLD_RA_OFFSET]; mov [$rsp + NEW_RA_OFFSET] and similar for the saved FP, instead of an initial sub to establish the frame followed by a second sub for tail call arguments space.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're experimenting with that on a different branch based on this PR, as a follow-up. Moving the stack check before the frame setup means that we can cut out even more instructions, as there's no need to move the frame pointer as well.

@elliottt elliottt force-pushed the trevor/eagerly-reserve-tail-call-arg-space branch from bd53c0a to f82767b Compare April 15, 2024 18:47
…logue

`return_call` instructions reuse the incoming argument area of the caller's
frame. As such if the caller's incoming argument area is  not exactly the right
size for the callee, some resizing will need to take place to ensure that the
return address, frame pointer, clobbers, and stack slots don't get overwritten.
The current solution on the main branch for the x64 backend is to explicitly
resize the frame via `ShrinkArgumentArea` or `GrowArgumentArea` right before
the `return_call` arguments are written to the stack, ensuring that there is
sufficient space. While this does work, it does make a `return_call` more
expensive when the resizing is necessary.

To simplify this, we instead resize the incoming argument area in the function
prologue to accommodate the largest possible argument area of any `return_call`
instruction in the function. We then shrink back down when necessary before an
individual `return_call`. This simplifies the implementation of tail calls on
x86_64, as we no longer need to move the entire frame, just the return address
before we jump to the tail-callee.

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
@elliottt elliottt force-pushed the trevor/eagerly-reserve-tail-call-arg-space branch from f82767b to 6b39b19 Compare April 15, 2024 18:49
@elliottt elliottt added this pull request to the merge queue Apr 15, 2024
Merged via the queue into bytecodealliance:main with commit bbd3c54 Apr 15, 2024
@elliottt elliottt deleted the trevor/eagerly-reserve-tail-call-arg-space branch April 15, 2024 20:25
jameysharp added a commit to jameysharp/wasmtime that referenced this pull request Apr 18, 2024
This reverts the key parts of e3a08d4
(bytecodealliance#8151), because it turns out that we didn't need that abstraction.

Several changes in the last month have enabled this:

- bytecodealliance#8292 and then bytecodealliance#8316 allow us to refer to either incoming or outgoing
  argument areas in a (mostly) consistent way

- bytecodealliance#8327, bytecodealliance#8377, and bytecodealliance#8383 demonstrate that we never need to delay
  writing stack arguments directly to their final location
jameysharp added a commit to jameysharp/wasmtime that referenced this pull request Apr 18, 2024
This reverts the key parts of e3a08d4
(bytecodealliance#8151), because it turns out that we didn't need that abstraction.

Several changes in the last month have enabled this:

- bytecodealliance#8292 and then bytecodealliance#8316 allow us to refer to either incoming or outgoing
  argument areas in a (mostly) consistent way

- bytecodealliance#8327, bytecodealliance#8377, and bytecodealliance#8383 demonstrate that we never need to delay
  writing stack arguments directly to their final location

prtest:full
github-merge-queue bot pushed a commit that referenced this pull request Apr 18, 2024
This reverts the key parts of e3a08d4
(#8151), because it turns out that we didn't need that abstraction.

Several changes in the last month have enabled this:

- #8292 and then #8316 allow us to refer to either incoming or outgoing
  argument areas in a (mostly) consistent way

- #8327, #8377, and #8383 demonstrate that we never need to delay
  writing stack arguments directly to their final location

prtest:full
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants