Eagerly reserve tail call stack arg space#8327
Eagerly reserve tail call stack arg space#8327elliottt merged 1 commit intobytecodealliance:mainfrom
Conversation
dc65e3f to
3f2be4d
Compare
0ae41a6 to
3e65778
Compare
|
|
||
| // When a return_call within this function required more stack arguments than we have | ||
| // present, resize the incoming argument area of the frame to accommodate those arguments. | ||
| let incoming_args_diff = frame_layout.tail_args_size - frame_layout.incoming_args_size; |
There was a problem hiding this comment.
We may need to add this to offset_upward_to_caller_sp in the unwind info below.
| ;; GrowArgumentArea does a memmove of everything in the frame except for | ||
| ;; the argument area, to make room for more arguments. That includes all | ||
| ;; the stack slots, the callee-saved registers, and the saved FP and | ||
| ;; return address. To keep the stack pointers in sync with that change, | ||
| ;; it also subtracts the given amount from both the FP and SP registers. | ||
| (GrowArgumentArea (amount u32) | ||
| (tmp WritableGpr)) | ||
|
|
||
| ;; ShrinkArgumentArea does a memmove of everything in the frame except | ||
| ;; for the argument area, to trim space for fewer arguments. That | ||
| ;; includes all the stack slots, the callee-saved registers, and the | ||
| ;; saved FP and return address. To keep the stack pointers in sync with | ||
| ;; that change, it also adds the given amount to both the FP and SP | ||
| ;; registers. | ||
| (ShrinkArgumentArea (amount u32) | ||
| (tmp WritableGpr)) |
| ; subq %rsp, $160, %rsp | ||
| ; movq %rsp, %rbp | ||
| ; movq 160(%rsp), %r11 | ||
| ; movq %r11, 0(%rsp) | ||
| ; movq 168(%rsp), %r11 | ||
| ; movq %r11, 8(%rsp) | ||
| ; subq %rsp, $160, %rsp |
There was a problem hiding this comment.
Definitely not a blocker for landing this PR, but it would be nice if we could get this to be a single sub $rsp, M+N followed by mov temp, [$rsp + OLD_RA_OFFSET]; mov [$rsp + NEW_RA_OFFSET] and similar for the saved FP, instead of an initial sub to establish the frame followed by a second sub for tail call arguments space.
There was a problem hiding this comment.
We're experimenting with that on a different branch based on this PR, as a follow-up. Moving the stack check before the frame setup means that we can cut out even more instructions, as there's no need to move the frame pointer as well.
bd53c0a to
f82767b
Compare
…logue `return_call` instructions reuse the incoming argument area of the caller's frame. As such if the caller's incoming argument area is not exactly the right size for the callee, some resizing will need to take place to ensure that the return address, frame pointer, clobbers, and stack slots don't get overwritten. The current solution on the main branch for the x64 backend is to explicitly resize the frame via `ShrinkArgumentArea` or `GrowArgumentArea` right before the `return_call` arguments are written to the stack, ensuring that there is sufficient space. While this does work, it does make a `return_call` more expensive when the resizing is necessary. To simplify this, we instead resize the incoming argument area in the function prologue to accommodate the largest possible argument area of any `return_call` instruction in the function. We then shrink back down when necessary before an individual `return_call`. This simplifies the implementation of tail calls on x86_64, as we no longer need to move the entire frame, just the return address before we jump to the tail-callee. Co-authored-by: Jamey Sharp <jsharp@fastly.com>
f82767b to
6b39b19
Compare
This reverts the key parts of e3a08d4 (bytecodealliance#8151), because it turns out that we didn't need that abstraction. Several changes in the last month have enabled this: - bytecodealliance#8292 and then bytecodealliance#8316 allow us to refer to either incoming or outgoing argument areas in a (mostly) consistent way - bytecodealliance#8327, bytecodealliance#8377, and bytecodealliance#8383 demonstrate that we never need to delay writing stack arguments directly to their final location
This reverts the key parts of e3a08d4 (bytecodealliance#8151), because it turns out that we didn't need that abstraction. Several changes in the last month have enabled this: - bytecodealliance#8292 and then bytecodealliance#8316 allow us to refer to either incoming or outgoing argument areas in a (mostly) consistent way - bytecodealliance#8327, bytecodealliance#8377, and bytecodealliance#8383 demonstrate that we never need to delay writing stack arguments directly to their final location prtest:full
This reverts the key parts of e3a08d4 (#8151), because it turns out that we didn't need that abstraction. Several changes in the last month have enabled this: - #8292 and then #8316 allow us to refer to either incoming or outgoing argument areas in a (mostly) consistent way - #8327, #8377, and #8383 demonstrate that we never need to delay writing stack arguments directly to their final location prtest:full
return_callinstructions reuse the incoming argument area of the caller's frame. As such if the caller's incoming argument area is not exactly the right size for the callee, some resizing will need to take place to ensure that the return address, frame pointer, clobbers, and stack slots don't get overwritten. The current solution on the main branch for the x64 backend is to explicitly resize the frame viaShrinkArgumentAreaorGrowArgumentArearight before thereturn_callarguments are written to the stack, ensuring that there is sufficient space. While this does work, it does make areturn_callmore expensive when the resizing is necessary.To simplify this, we instead resize the incoming argument area in the function prologue to accommodate the largest possible argument area of any
return_callinstruction in the function. We then shrink back down when necessary before an individualreturn_call. This simplifies the implementation of tail calls on x86_64, as we no longer need to move the entire frame, just the return address before we jump to the tail-callee.Co-authored-by: Jamey Sharp jsharp@fastly.com