introduce trampolines for methods#2705
Conversation
32f27ac to
8a9b25a
Compare
|
It is not really related, but I had a look while reviewing: Doesn't |
adamreichold
left a comment
There was a problem hiding this comment.
Great code bloat reduction! A nit and an unrelated question, but this looks good to me.
Ahh, yes, when working on PanicTrap was one of the moments when I was thinking a refactor like this could be in order... Then I forgot to install a PanicTrap here. Let me push 😆 |
| #python_name, | ||
| _pyo3::impl_::pymethods::PyCFunction(#wrapper), | ||
| _pyo3::impl_::pymethods::PyCFunction({ | ||
| unsafe extern "C" fn trampoline( |
There was a problem hiding this comment.
Would be cool not to have to repeat the (almost) identical signature here and in the trampoline fn. But the required macro trickery isn't worth it, likely?
There was a problem hiding this comment.
I agree with this, however I'd like to defer it to a follow-up PR sometime (there's a fair bit of refactoring which could improve the macro code, and I'd like it not to distract readers of this PR from the behavioural change).
9032f75 to
a3f59c8
Compare
3029: use dynamic trampoline for all getters and setters r=adamreichold a=davidhewitt This is an extension to the "trampoline" changes made in #2705 to re-use a single trampoline for all `#[getter]`s (and similar for all `#[setters]`). It works by setting the currently-unused `closure` member of the `ffi::PyGetSetDef` structure to point at a new `struct GetSetDefClosure` which contains function pointers to the `getter` / `setter` implementations. A universal trampoline for all `getter`, for example, then works by reading the actual getter implementation out of the `GetSetDefClosure`. Advantages of doing this: - Very minimal simplification to the macro code / generated code size. It made a 4.4% reduction to `test_getter_setter` generated size, which is an exaggerated result as most code will probably have lots of bulk that isn't just the macro code. Disadvantages: - Additional level of complexity in the `getter` and `setter` trampolines and accompanying code. - To keep the `GetSetDefClosure` objects alive, I've added them to the static `LazyTypeObject` inner. - Very slight performance overhead at runtime (shouldn't be more than an additional pointer read). It's so slight I couldn't measure it. Overall I'm happy to either merge or close this based on what reviewers think! Co-authored-by: David Hewitt <1939362+davidhewitt@users.noreply.github.com>
This is a refactoring to the generated
#[pyfunction]and#[pymethods]with the intention of reducing compile time and binary size.The motivation for the change is that
cargo llvm-linescurrently reports many monomorphizations ofstd::panicking::try::do_calland friends being instantiated - in fact, it's one per#[pyfunction]and every method in#[pymethods], which is quite a lot of instantiations.The change follows a similar idea to #2478 changing
extract_argumentto make use of "dynamic dispatch". Here, we create "trampoline" wrappers for all possible C-API function pointers we expose to Python. These wrappers take a function pointer to the real implementation of the function in question, and call it after starting thestd::panic::catch_unwindmachiner and creating aGILPool.Much like in #2704 I suspect LLVM can do a pretty good job at optimizing out the dynamic dispatch. I saw no change at all in benchmarks for the
pytestscrate. So overall I believe this should have a compile-time win with little-to-no impact on runtime performance.For our
pytestscrate I get a 10% reduction incargo llvm-linescount, and for real-world use cases potentially with many small function wrappers (e.g. getters / setters) I suspect the reduction could be even greater.Detail-wise, for some method
MyClass::foo, the generated code change from a single C-ABI symbol:to an C-ABI symbol which just calls the trampoline wrapper with a pointer to the actual implementation.