Implement stack limit checks for AArch64 by alexcrichton · Pull Request #1573 · bytecodealliance/wasmtime

alexcrichton · 2020-04-21T23:39:42Z

This commit implements the stack limit checks in cranelift for the
AArch64 backend. This gets the stack_limit argument purpose as well as
a function's global stack_limit directive working for the AArch64
backend. I've tested this locally on some hardware and in an emulator
and it looks to be working for basic tests, but I've never really done
AArch64 before so some scrutiny on the instructions would be most
welcome!

I also suspect that these stack limit checks aren't quite optimal, but they relatively closely match the x86 backend (which also isn't optimal), so for now I'm largely hoping for correctness as opposed to speed.

Closes #1569

github-actions · 2020-04-22T00:15:29Z

Subscribe to Label Action

cc @bnjbvr

Details

This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64"

Thus the following users have been cc'd because of the following labels:

bnjbvr: cranelift

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

cranelift/codegen/src/isa/aarch64/abi.rs

cfallin

Thanks for this -- you jumped into the new backend guts remarkably quickly!

The main concern is the choice of register to hold the stack limit in the function body, as noted below; spilltmp comes with a bunch of baggage as it's used in other ways (notably register-allocator-inserted spills/reloads, which can happen in lots of places). I think the fix is relatively straightforward though!

cfallin · 2020-04-22T18:35:24Z

cranelift/codegen/src/isa/aarch64/abi.rs

+    let reg = generate_gv(f, abi, gv, &mut insts);
+    return (reg, insts);
+
+    fn generate_gv(


Hmm. I'm not wild about having this GlobalValue expansion implementation local to the stack-limit code, but upon looking at aarch64/lower.rs, it seems that in all other cases the legalization pass transforms actual GlobalValue ops into core ops as here.

Perhaps we could at least move generate_gv to aarch64/lower.rs (passing in the vmctx reg as an arg)? This could eventually be used if we shift from the current encodings-based legalization framework to something simpler/more purpose-built as well.

So FWIW this is definitely an "unfortunate" special context. This is "duplicated" with the x86 backend as well. The reason for this is that prologue/epilogue happens so late (way after legalization and even after register allocation) that it's difficult to share code. The hope is that the number of global values we'd need to process wouldn't be that large and/or it'd be easy to extend them over time.

In that sense I suspect that the global value legalization pass is unlikely to go away (it's architecture independent too!). That being said I'm happy to move this code around. Given that do you think it'd still have a better home in lower.rs?

(I can definitely do a better job about documenting all this too)

Hmm, OK. I still think that we should move this to lower.rs, but other refactoring and cleanup can come later -- don't want to hold this up for it!

cranelift/codegen/src/isa/aarch64/abi.rs

cfallin · 2020-04-22T18:40:08Z

cranelift/codegen/src/isa/aarch64/abi.rs

            stackslots.push(off);
        }

+        let stack_limit = match f


Factor this pattern out (also occurs above) -- take f and sig, and return reg for vmctx plus Vec<Inst>?

To allocate a new temp as an into_reg to hold stack_limit, we'll need to finagle the plumbing a bit. Vreg temps are allocated by the LowerCtx, which we've kept separate from the ABI traits, and using this will be tricky because we can't have generic methods on the ABIBody trait. The ordering is slightly tricky too: the ABIBody is created before lowering starts, so that's too early to alloc a temp, but gen_prologue happens late (we lower instructions in reverse order) so we can't do it there either.

Suggestion: perhaps add an init method on the ABIBody trait that takes a "just in case" temp? (vregs are basically free, aside from creating sparsity in the ID space, so this is fine.) Then this could set stack_limit for use by gen_prologue and any calls.

Actually, one more thought: after reading through the rest of the patch, it seems that stack_limit is not used beyond the prologue. Given this, I think it would be fine to use spilltmp within gen_prologue itself, and then carefully validate that we don't keep it live across anything else that could implicitly use it. This includes any virtual registers at all: the register allocator is liable to pop in and do a surprise spill with a big offset somewhere, wiping out spilltmp.

cfallin · 2020-04-22T18:44:16Z

cranelift/codegen/src/isa/aarch64/abi.rs

+                readonly: _,
+            } => {
+                let base = generate_gv(f, abi, base, insts);
+                let into_reg = writable_spilltmp_reg();


Eek, I'm not sure if it's safe to use spilltmp here -- or at least it's not clear it will interact well with anything we might write in the future. The basic invariant is that spilltmp should only be used within a sequence of instructions generated as one unit (the original use was to compute large stackframe offsets for regalloc spills/reloads, hence the name). We should probably take an into_reg as an argument of generate_gv here, and then by implication, to gen_stack_limit too.

(I should've added more documentation about this, sorry -- if you wouldn't mind adding a comment on spilltmp_reg() to this effect, that'd be very helpful, thanks!)

cranelift/codegen/src/isa/aarch64/abi.rs

cranelift/codegen/src/isa/aarch64/inst/emit.rs

alexcrichton · 2020-04-22T21:19:07Z

Thanks for taking a look @cfallin! I definitely have some documentation to shore up no matter what :).

In any case I wanted to ask some questions about your comments. It sounds like you're thinking we should use vregs (which I presume is a virtual not-yet-allocated register, right?) for this stuff. I think that might be difficult though because prologue generation happens after register allocation. The saving grace, though, is that the registers used here in the prologue are only used during the prologue and never afterwards. Additionally I think that all caller-saved registers are available, not just the spilltmp one (that one just had a somewhat convenient name to pull from).

Does that make sense? I could also very well be missing something about how to plumb vregs in here.

I also don't mind writing more documentation about the stack limit support in cranelift. I didn't add documentation in my previous PR (boo) and would be happy to add some here which has notes about how this is implemented for each architecture's backend and how it's running at a unique time where register allocation/legalization aren't available.

cfallin · 2020-04-22T22:58:30Z

So, one clarification: in the new backend, the prologue generation runs before register allocation. So we are free to allocate temp virtual registers (vregs) and use them if we need to. (I guess the old backend design didn't do it this way; I didn't study the reasoning for that in detail, but the current approach made things a lot simpler for us.) Given that, I think it would be best if we added just a little plumbing: alloc a vreg when calling gen_prologue() and pass it in (as a Writable<Reg>). Sorry for the confusion on this point!

cfallin · 2020-04-22T23:00:24Z

(And, as a followup clarification to that, we made sure that the register allocator understands live ranges of both real regs and virtual regs, so it's totally OK to use a vreg and then later in the prologue, refer to a realreg like an argument register; the regalloc considers the real-reg live range when choosing where to allocate the vreg.)

alexcrichton · 2020-04-23T14:29:05Z

Oh that's news to me! I'm a bit confused though because where gen_prologue is called takes a RegAllocResult which sounds like register allocation has already happened?

bjorn3 · 2020-04-24T12:58:14Z

I believe gen_prologue is indeed run after regalloc.

alexcrichton · 2020-04-24T15:19:31Z

@cfallin hm so given that register allocation runs before prologue/epilogue generation, do you think it'd be ok to shore up the documentation around these and say how they can use any caller-saved register and how we're doing manual register allocation because we generate the prologue so late?

cfallin · 2020-04-24T17:24:25Z

Argh, OK, I was deeply confused yesterday; my apologies. (Many balls in the air now that we're tying-up-loose-ends...) You're correct, prologue generation happens after regalloc. I think that I was thinking about the other bits of ABI-generated instruction sequences (argument/retval copies) which use vregs freely.

Indeed, the prologue can use only real regs. So I think the best way forward is to use the spilltmp reg as you have, and then carefully document why it's safe -- between the def (computing the GV) and use (the check itself) no spills could have happened.

alexcrichton · 2020-04-24T19:35:28Z

Ok I've tried to add some comments and such, as well as rebasing and uncommenting a few tests which now pass. Mind taking another looking @cfallin?

cfallin

Looks good, thanks! Sorry about the earlier confusion and thanks for your patience!

cfallin · 2020-04-24T19:50:16Z

cranelift/codegen/src/isa/aarch64/abi.rs

+    /// means that we need to do manual register allocation here and also be
+    /// careful to not clobber any callee-saved or argument registers. For now
+    /// this routine makes do with the `writable_spilltmp_reg` as one temporary
+    /// register, and a second register of `x16` which is caler-saved. This


nit: s/caler/caller/

This commit implements the stack limit checks in cranelift for the AArch64 backend. This gets the `stack_limit` argument purpose as well as a function's global `stack_limit` directive working for the AArch64 backend. I've tested this locally on some hardware and in an emulator and it looks to be working for basic tests, but I've never really done AArch64 before so some scrutiny on the instructions would be most welcome!

alexcrichton · 2020-04-24T20:02:03Z

No worries at all! Thanks again for the review :)

alexcrichton force-pushed the aarch64-stack-limit branch from 0585d6a to 2d7a4ce Compare April 21, 2020 23:40

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. labels Apr 22, 2020

bnjbvr requested a review from cfallin April 22, 2020 10:12

bjorn3 reviewed Apr 22, 2020

View reviewed changes

cranelift/codegen/src/isa/aarch64/abi.rs Outdated Show resolved Hide resolved

cfallin reviewed Apr 22, 2020

View reviewed changes

cfallin mentioned this pull request Apr 23, 2020

aarch64: emit SP copies when SP is involved in an explicit add during address lowering #1586

Merged

alexcrichton force-pushed the aarch64-stack-limit branch from 2d7a4ce to 55a48b7 Compare April 24, 2020 19:35

cfallin approved these changes Apr 24, 2020

View reviewed changes

alexcrichton force-pushed the aarch64-stack-limit branch from 55a48b7 to 2bef279 Compare April 24, 2020 20:01

alexcrichton merged commit 74eda80 into bytecodealliance:master Apr 24, 2020

alexcrichton deleted the aarch64-stack-limit branch April 24, 2020 20:02

cfallin added a commit to cfallin/wasmtime that referenced this pull request Apr 24, 2020

Remove temporary files introduced by bytecodealliance#1573.

eeff3f4

cfallin mentioned this pull request Apr 24, 2020

Remove temporary files introduced by #1573. #1594

Merged

alexcrichton pushed a commit that referenced this pull request Apr 24, 2020

Remove temporary files introduced by #1573. (#1594)

ebd13ce

Conversation

alexcrichton commented Apr 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 22, 2020

Subscribe to Label Action

Uh oh!

Uh oh!

cfallin left a comment

Choose a reason for hiding this comment

Uh oh!

cfallin Apr 22, 2020

Choose a reason for hiding this comment

Uh oh!

alexcrichton Apr 22, 2020

Choose a reason for hiding this comment

Uh oh!

cfallin Apr 22, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cfallin Apr 22, 2020

Choose a reason for hiding this comment

Uh oh!

cfallin Apr 22, 2020

Choose a reason for hiding this comment

Uh oh!

cfallin Apr 22, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alexcrichton commented Apr 22, 2020

Uh oh!

cfallin commented Apr 22, 2020

Uh oh!

cfallin commented Apr 22, 2020

Uh oh!

alexcrichton commented Apr 23, 2020

Uh oh!

bjorn3 commented Apr 24, 2020

Uh oh!

alexcrichton commented Apr 24, 2020

Uh oh!

cfallin commented Apr 24, 2020

Uh oh!

alexcrichton commented Apr 24, 2020

Uh oh!

cfallin left a comment

Choose a reason for hiding this comment

Uh oh!

cfallin Apr 24, 2020

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Apr 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexcrichton commented Apr 21, 2020 •

edited

Loading