x64: reduce work in non-stack using leaf functions#5352
x64: reduce work in non-stack using leaf functions#5352abrown wants to merge 5 commits intobytecodealliance:mainfrom
Conversation
|
Here is the raw output of the benchmarking I did on my current system. It is hard to interpret the results (is it too noisy? not enough runs?) but here they are! with the default
|
cfallin
left a comment
There was a problem hiding this comment.
The changes to enable this are pleasantly simple (I guess because we already have the generic infra for it) -- thanks! Seeing 1.8k lines of prologues/epilogues disappear in the tests is nice.
Happy to r+ once the test failures are resolved (unfortunately the failure on CI does not seem to be very informative; just a segfault in filetests)...
cf67ec5 to
1cc2b42
Compare
Cranelift has had the ability for some time to identify leaf functions; by Cranelift's definition, a leaf function is one that knows of no other call signatures. bytecodealliance#1148 noted how it would be a good idea to avoid extra frame setup work in leaf functions and bytecodealliance#2960 implemented this for aarch64 and s390x. This improvement was not made for x64 due to some test failures. This change avoids any frame setup for non-stack-using leaf functions in x64.
1cc2b42 to
6e15742
Compare
6e15742 to
d9945d4
Compare
| || num_clobbered_callee_saves > 0 | ||
| || frame_storage_size > 0) | ||
| // TODO | ||
| && !cfg!(target_os = "macos") |
There was a problem hiding this comment.
Maybe my suggestion at #4469 (comment) would work? "non-leaf" could be used by default and "always" could be used on macOS.
Cranelift has had the ability for some time to identify leaf functions;
by Cranelift's definition, a leaf function is one that knows of no other
call signatures. #1148 noted how it would be a good idea to avoid extra
frame setup work in leaf functions and #2960 implemented this for
aarch64 and s390x. This improvement was not made for x64 due to some
test failures. This change avoids any frame setup for non-stack-using
leaf functions in x64.
Because this updates the generated code, there are multiple sets of tests
to update -- each of these is separated here into its own commit. I will follow
up in the comments with benchmark results.