Allow pinning a certain register in Cranelift, and use it as the heap base#960
Allow pinning a certain register in Cranelift, and use it as the heap base#960bnjbvr merged 3 commits intobytecodealliance:masterfrom
Conversation
|
🎉🎉🎉 i was hoping to make time next week to do just this with the Spidermonkey changes you mentioned recently! instead i'll test run Lucet's tests + wasm spec tests with these changes instead :) are you aware of use cases for pinning n>1 registers? there's a few we've talked about with Lucet, but we haven't measured what improvement we'd get yet. |
Note you still need to use the new instruction to set the heap base, before entering code compiled with Cranelift, and in certain precise points (when translating a call to an import, or an indirect call, or after wasm's mem.grow, and maybe others I'm not thinking of right now).
Well, ideally this would be its own mechanism in register allocation, maybe as a hint that some value is very important and should be preserved in its own register for the entire function's lifetime. So you could extend this mechanism to multiple heaps (when that comes in wasm), or you could imagine a very important value to live in its own register (emulated stack pointer, e.g.), or on some RISC platforms it could contain an interesting constant value (zero register, as on MIPS). Spidermonkey also pins a local instance's storage (called thread local storage), that's the "VmCtx" parameter that's passed in the Baldrdash signatures. It was a bit more complicated to emulate what's done for VmCtx, especially to tell Cranelift how to recompute the heap base (this is done in the wasm->clif translation that's done on Spidermonkey's side). Did you have other uses in mind? |
|
This is in better shape, with even more restrictions compared to previous iterations:
Any other case will trigger an assertion. Thanks to Julian's recent patches, the pinned reg (r15) was much likely to be taken as a fill's target, so I've got more confidence in this new version, which passes all of Cranelift, Embenchen and Spidermonkey testing. This is ready for review. |
|
poppin' back in to say that Lucet's tests all pass, wasm spectests pass (well, as much as they did before), and most benchmarks look ~10-20% better! We have a few that measure slower, and I'm going to investigate why. I suspect it's unrelated to this change.
This is about what I expected, too, yeah. I've been wondering if it makes sense to think about a register-pinned value as a particular flavor of GlobalValue, but that's not right since a global value would have a (much) wider lifetime than a "keep this value in this register" guidance that pinning a register really is.
We have a bit of code in Lucet to optionally count the number of wasm operations executed, which currently works by adding to a counter tacked on to our VmCtx via wasm instrumented as we hand it to |
lars-t-hansen
left a comment
There was a problem hiding this comment.
This looks fine to me, as far as my knowledge goes -- I know little about the solver.
Based on #959. So this is very experimental, probably a bit sketchy. This allows pinning one register (r15 on x86 64 bits) so it's entirely in the control of the embedder. In particular, it tries to exclude it from register allocation. Assertions in this code plus all my tests passing make me confident it isn't horribly wrong, but it doesn't prove this is incredibly good either. (I'm a bit worried about local diversions in particular.) Also, this is not enough tested at the moment, I'll try to add more test cases and fuzz this a bit (?).
Second commit adds a second setting that enables use of the pinned register as the heap base when legalizing heap_addr instructions. If the embedder does the Right Thing, and correctly sets the heap base with set_pinned_reg, this is enough to avoid reloading the heap base before every single memory access, and is a nice win in Spidermonkey (up to 10% in some of our benchmarks).