cranelift: Specialize StackAMode::FPOffset by jameysharp · Pull Request #8292 · bytecodealliance/wasmtime

jameysharp · 2024-04-02T21:44:07Z

The StackAMode::FPOffset address mode was always used together with fp_to_arg_offset, to compute addresses within the current stack frame's argument area.

Instead, introduce a new StackAMode::ArgOffset variant specifically for stack addresses within the current frame's argument area. The details of how to find the argument area are folded into the conversion from the target-independent StackAMode into target-dependent address-mode types.

Currently, fp_to_arg_offset returns a target-specific constant, so I've preserved that constant in each backend's address-mode conversion.

However, in general the location of the argument area may depend on calling convention, flags, or other concerns. Also, it may not always be desirable to use a frame pointer register as the base to find the argument area. I expect some backends will eventually need to introduce new synthetic addressing modes to resolve argument-area offsets after register allocation, when the full frame layout is known.

I also cleaned up a couple minor things while I was in the area:

Determining argument extension type was written in a confusing way and also had a typo in the comment describing it.
riscv64's AMode::offset was only used in one place and is clearer when inlined.

The StackAMode::FPOffset address mode was always used together with fp_to_arg_offset, to compute addresses within the current stack frame's argument area. Instead, introduce a new StackAMode::ArgOffset variant specifically for stack addresses within the current frame's argument area. The details of how to find the argument area are folded into the conversion from the target-independent StackAMode into target-dependent address-mode types. Currently, fp_to_arg_offset returns a target-specific constant, so I've preserved that constant in each backend's address-mode conversion. However, in general the location of the argument area may depend on calling convention, flags, or other concerns. Also, it may not always be desirable to use a frame pointer register as the base to find the argument area. I expect some backends will eventually need to introduce new synthetic addressing modes to resolve argument-area offsets after register allocation, when the full frame layout is known. I also cleaned up a couple minor things while I was in the area: - Determining argument extension type was written in a confusing way and also had a typo in the comment describing it. - riscv64's AMode::offset was only used in one place and is clearer when inlined.

abrown

Nice clean up!

abrown · 2024-04-03T10:53:49Z

cranelift/codegen/src/isa/x64/abi.rs

-            StackAMode::FPOffset(off, _ty) => {
+            StackAMode::ArgOffset(off, _ty) => {
                let off = i32::try_from(off)
                    .expect("Offset in FPOffset is greater than 2GB; should hit impl limit first");


Suggested change

.expect("Offset in FPOffset is greater than 2GB; should hit impl limit first");

.expect("Offset in ArgOffset is greater than 2GB; should hit implementation limit first");

The + 16 to compute the final frame pointer offset can now overflow too, right?

Good catches! With regard to overflow, I'm moving this +16 inside the i32::try_from so the add happens at i64 instead. That makes its overflow behavior the same as current, which already had unchecked i64 addition. Similarly, the aarch64 and riscv64 targets are still doing the addition at i64 after this patch, and s390x doesn't add anything so can't overflow.

@bjorn3

@bjorn3 correctly pointed out that I had changed the overflow behavior of this address computation. The existing code always added the result of `fp_to_arg_offset` using `i64` addition. It used Rust's default overflow behavior for addition, which panics in debug builds and wraps in release builds. In this commit I'm preserving that behavior: - s390x doesn't add anything, so can't overflow. - aarch64 and riscv64 use `i64` offsets in `FPOffset` address modes, so the addition is still using `i64` addition. - x64 does a checked narrowing to `i32`, so it's important to do the addition before that, on the wider `i64` offset.

This reverts the key parts of e3a08d4 (bytecodealliance#8151), because it turns out that we didn't need that abstraction. Several changes in the last month have enabled this: - bytecodealliance#8292 and then bytecodealliance#8316 allow us to refer to either incoming or outgoing argument areas in a (mostly) consistent way - bytecodealliance#8327, bytecodealliance#8377, and bytecodealliance#8383 demonstrate that we never need to delay writing stack arguments directly to their final location

This reverts the key parts of e3a08d4 (bytecodealliance#8151), because it turns out that we didn't need that abstraction. Several changes in the last month have enabled this: - bytecodealliance#8292 and then bytecodealliance#8316 allow us to refer to either incoming or outgoing argument areas in a (mostly) consistent way - bytecodealliance#8327, bytecodealliance#8377, and bytecodealliance#8383 demonstrate that we never need to delay writing stack arguments directly to their final location prtest:full

This reverts the key parts of e3a08d4 (#8151), because it turns out that we didn't need that abstraction. Several changes in the last month have enabled this: - #8292 and then #8316 allow us to refer to either incoming or outgoing argument areas in a (mostly) consistent way - #8327, #8377, and #8383 demonstrate that we never need to delay writing stack arguments directly to their final location prtest:full

jameysharp requested a review from a team as a code owner April 2, 2024 21:44

jameysharp requested review from abrown and removed request for a team April 2, 2024 21:44

abrown approved these changes Apr 3, 2024

View reviewed changes

jameysharp enabled auto-merge April 3, 2024 17:15

jameysharp added this pull request to the merge queue Apr 3, 2024

Merged via the queue into bytecodealliance:main with commit 9f94462 Apr 3, 2024

jameysharp deleted the specialize-fp-offset branch April 3, 2024 18:04

jameysharp mentioned this pull request Apr 18, 2024

cranelift: Emit argument location uses eagerly in gen_arg #8398

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cranelift: Specialize StackAMode::FPOffset#8292

cranelift: Specialize StackAMode::FPOffset#8292
jameysharp merged 2 commits intobytecodealliance:mainfrom
jameysharp:specialize-fp-offset

jameysharp commented Apr 2, 2024

Uh oh!

abrown left a comment

Uh oh!

abrown Apr 3, 2024

Uh oh!

bjorn3 Apr 3, 2024

Uh oh!

jameysharp Apr 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	.expect("Offset in FPOffset is greater than 2GB; should hit impl limit first");
	.expect("Offset in ArgOffset is greater than 2GB; should hit implementation limit first");

Conversation

jameysharp commented Apr 2, 2024

Uh oh!

abrown left a comment

Choose a reason for hiding this comment

Uh oh!

abrown Apr 3, 2024

Choose a reason for hiding this comment

Uh oh!

bjorn3 Apr 3, 2024

Choose a reason for hiding this comment

Uh oh!

jameysharp Apr 3, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants