Skip to content

cranelift: Specialize StackAMode::FPOffset#8292

Merged
jameysharp merged 2 commits intobytecodealliance:mainfrom
jameysharp:specialize-fp-offset
Apr 3, 2024
Merged

cranelift: Specialize StackAMode::FPOffset#8292
jameysharp merged 2 commits intobytecodealliance:mainfrom
jameysharp:specialize-fp-offset

Conversation

@jameysharp
Copy link
Contributor

The StackAMode::FPOffset address mode was always used together with fp_to_arg_offset, to compute addresses within the current stack frame's argument area.

Instead, introduce a new StackAMode::ArgOffset variant specifically for stack addresses within the current frame's argument area. The details of how to find the argument area are folded into the conversion from the target-independent StackAMode into target-dependent address-mode types.

Currently, fp_to_arg_offset returns a target-specific constant, so I've preserved that constant in each backend's address-mode conversion.

However, in general the location of the argument area may depend on calling convention, flags, or other concerns. Also, it may not always be desirable to use a frame pointer register as the base to find the argument area. I expect some backends will eventually need to introduce new synthetic addressing modes to resolve argument-area offsets after register allocation, when the full frame layout is known.

I also cleaned up a couple minor things while I was in the area:

  • Determining argument extension type was written in a confusing way and also had a typo in the comment describing it.
  • riscv64's AMode::offset was only used in one place and is clearer when inlined.

The StackAMode::FPOffset address mode was always used together with
fp_to_arg_offset, to compute addresses within the current stack frame's
argument area.

Instead, introduce a new StackAMode::ArgOffset variant specifically for
stack addresses within the current frame's argument area. The details of
how to find the argument area are folded into the conversion from the
target-independent StackAMode into target-dependent address-mode types.

Currently, fp_to_arg_offset returns a target-specific constant, so I've
preserved that constant in each backend's address-mode conversion.

However, in general the location of the argument area may depend on
calling convention, flags, or other concerns. Also, it may not always be
desirable to use a frame pointer register as the base to find the
argument area. I expect some backends will eventually need to introduce
new synthetic addressing modes to resolve argument-area offsets after
register allocation, when the full frame layout is known.

I also cleaned up a couple minor things while I was in the area:
- Determining argument extension type was written in a confusing way and
  also had a typo in the comment describing it.
- riscv64's AMode::offset was only used in one place and is clearer
  when inlined.
@jameysharp jameysharp requested a review from a team as a code owner April 2, 2024 21:44
@jameysharp jameysharp requested review from abrown and removed request for a team April 2, 2024 21:44
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen labels Apr 2, 2024
Copy link
Member

@abrown abrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice clean up!

StackAMode::FPOffset(off, _ty) => {
StackAMode::ArgOffset(off, _ty) => {
let off = i32::try_from(off)
.expect("Offset in FPOffset is greater than 2GB; should hit impl limit first");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.expect("Offset in FPOffset is greater than 2GB; should hit impl limit first");
.expect("Offset in ArgOffset is greater than 2GB; should hit implementation limit first");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The + 16 to compute the final frame pointer offset can now overflow too, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catches! With regard to overflow, I'm moving this +16 inside the i32::try_from so the add happens at i64 instead. That makes its overflow behavior the same as current, which already had unchecked i64 addition. Similarly, the aarch64 and riscv64 targets are still doing the addition at i64 after this patch, and s390x doesn't add anything so can't overflow.

@bjorn3 correctly pointed out that I had changed the overflow behavior
of this address computation.

The existing code always added the result of `fp_to_arg_offset` using
`i64` addition. It used Rust's default overflow behavior for addition,
which panics in debug builds and wraps in release builds.

In this commit I'm preserving that behavior:

- s390x doesn't add anything, so can't overflow.

- aarch64 and riscv64 use `i64` offsets in `FPOffset` address modes, so
  the addition is still using `i64` addition.

- x64 does a checked narrowing to `i32`, so it's important to do the
  addition before that, on the wider `i64` offset.
@jameysharp jameysharp enabled auto-merge April 3, 2024 17:15
@jameysharp jameysharp added this pull request to the merge queue Apr 3, 2024
Merged via the queue into bytecodealliance:main with commit 9f94462 Apr 3, 2024
@jameysharp jameysharp deleted the specialize-fp-offset branch April 3, 2024 18:04
jameysharp added a commit to jameysharp/wasmtime that referenced this pull request Apr 18, 2024
This reverts the key parts of e3a08d4
(bytecodealliance#8151), because it turns out that we didn't need that abstraction.

Several changes in the last month have enabled this:

- bytecodealliance#8292 and then bytecodealliance#8316 allow us to refer to either incoming or outgoing
  argument areas in a (mostly) consistent way

- bytecodealliance#8327, bytecodealliance#8377, and bytecodealliance#8383 demonstrate that we never need to delay
  writing stack arguments directly to their final location
jameysharp added a commit to jameysharp/wasmtime that referenced this pull request Apr 18, 2024
This reverts the key parts of e3a08d4
(bytecodealliance#8151), because it turns out that we didn't need that abstraction.

Several changes in the last month have enabled this:

- bytecodealliance#8292 and then bytecodealliance#8316 allow us to refer to either incoming or outgoing
  argument areas in a (mostly) consistent way

- bytecodealliance#8327, bytecodealliance#8377, and bytecodealliance#8383 demonstrate that we never need to delay
  writing stack arguments directly to their final location

prtest:full
github-merge-queue bot pushed a commit that referenced this pull request Apr 18, 2024
This reverts the key parts of e3a08d4
(#8151), because it turns out that we didn't need that abstraction.

Several changes in the last month have enabled this:

- #8292 and then #8316 allow us to refer to either incoming or outgoing
  argument areas in a (mostly) consistent way

- #8327, #8377, and #8383 demonstrate that we never need to delay
  writing stack arguments directly to their final location

prtest:full
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants