Skip to content

Cranelift AArch64: Migrate Splat to ISLE#4521

Merged
cfallin merged 1 commit intobytecodealliance:mainfrom
akirilov-arm:aarch64_isle_migration
Jul 26, 2022
Merged

Cranelift AArch64: Migrate Splat to ISLE#4521
cfallin merged 1 commit intobytecodealliance:mainfrom
akirilov-arm:aarch64_isle_migration

Conversation

@akirilov-arm
Copy link
Contributor

No description provided.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. labels Jul 25, 2022
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and thanks for the fixes to preferred/non-preferred vec regs as well -- seems to have helped in a few places!

@cfallin
Copy link
Member

cfallin commented Jul 25, 2022

Happy to merge once merge conflicts are resolved.

@akirilov-arm
Copy link
Contributor Author

@cfallin I have also noticed that x16 and x17 are reserved as spill temporaries, hence not allocatable - is that still relevant for regalloc2?

@cfallin
Copy link
Member

cfallin commented Jul 26, 2022

@akirilov-arm yes, unfortunately... regalloc2 actually doesn't need any temporaries anymore, but aarch64 itself does. The reason is that a spillslot may be at a greater offset from sp or fp than we can reach with an imm12, so we need a sequence of instructions to synthesize the address of a spillslot before spilling or reloading. That sequence itself can't require spilling another register if all registers are full (as they are likely to be if we're spilling in the first place), so we need to set aside x16 for that.

If I recall correctly, x17 is used in stack-limit check sequences, but my memory is less clear; I'd have to go look at the use-cases again.

@cfallin
Copy link
Member

cfallin commented Jul 26, 2022

To follow up on that, I guess an alternative approach would be to reserve a small-offset slot to spill another victim to if we need to compute a spillslot address at a large distance away -- so we can bootstrap our way there with no registers initially free. That's a little more complexity but I wouldn't be averse to reviewing a PR if someone wants to attempt that.

Copyright (c) 2022, Arm Limited.
@akirilov-arm akirilov-arm force-pushed the aarch64_isle_migration branch from 02c8044 to 2983e08 Compare July 26, 2022 16:50
@akirilov-arm
Copy link
Contributor Author

Another alternative is to reserve a vector register, which would give us the same space as 2 GPRs.

One further idea - if the preserve_frame_pointers flag introduced by PR #4469 is true, then we should be able to use lr/x30 as a temporary register instead of making it unallocatable as it is right now.

@cfallin
Copy link
Member

cfallin commented Jul 26, 2022

One further idea - if the preserve_frame_pointers flag introduced by PR #4469 is true, then we should be able to use lr/x30 as a temporary register instead of making it unallocatable as it is right now.

Ah, that's a really interesting idea actually. I guess it's not needed for unwind (that starts from fp), and in the middle of this sequence we won't be doing any calls, and it's otherwise usable as a regular GPR... I like it.

@akirilov-arm
Copy link
Contributor Author

akirilov-arm commented Jul 26, 2022

The optimal solution would be to adjust the set of allocatable registers on a per-function basis, so that we could do the same for non-leaf functions (or leaf functions that use the stack) when preserve_frame_pointers is false, but I don't think the backend plumbing is set to enable this.

P.S. Actually I didn't mean to use x30 as a spill temporary, but as a generic temporary register, i.e. put it in the set of preferred registers for allocation; calls would be set up to clobber it, so that regalloc would do the right thing. Restricting it to a spill temporary is also a viable idea because we would be able to reclaim either x16 or x17; I suppose I am just ambituous and want all of them 😄.

@jameysharp
Copy link
Contributor

I'm curious about the small-offset spillslot idea. I haven't looked at aarch64 details and don't know enough about Cranelift internals yet, but my assumptions are:

  • The imm12 is a signed 12-bit immediate, so this challenge only happens if the stack frame is bigger than 2kB.
  • sp and fp point to opposite ends of the stack frame so if you pick whichever is closer as the base then you could avoid spilling for any frame smaller than 4kB.
  • So I imagine there's an opportunity for a lot of functions to gain an extra register.
  • But we don't know how big the stack frame is until after register allocation because it depends on how many spills are inserted.
  • Still, we could decide after regalloc whether to add a small-offset spillslot; if we need it, making the stack frame bigger doesn't change the fact that we need it.

Something along those lines?

@cfallin
Copy link
Member

cfallin commented Jul 26, 2022

@jameysharp yep, that's more or less it.

@akirilov-arm re:

I don't think the backend plumbing is set to enable this

we could definitely change that! The only thing I want to hold as a hard requirement is that we don't build it dynamically per-function (because there are lots of tiny functions and that would be a nontrivial cost); right now we build it once when the compiler backend is constructed. We could perhaps build a few versions of it though, and return the right one in the regalloc2::Function trait -- one for leaf functions and one without; and variations based on compiler flags.

Anyway we're getting on quite a tangent here but if you're interested, please feel free to file an issue to "reclaim spilltmp registers on aarch64" as a future enhancement to track this!

@cfallin cfallin enabled auto-merge (squash) July 26, 2022 17:41
@cfallin cfallin merged commit ead6edb into bytecodealliance:main Jul 26, 2022
@akirilov-arm akirilov-arm deleted the aarch64_isle_migration branch July 26, 2022 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants