Implement METH_FASTCALL for pyfunctions.#1619
Conversation
|
Note for reviewers: This PR only works if the |
|
Very cool, thank you for working on this! Yes I'm going to work on the aforementioned config crate tonight :) I see you've marked as draft; do you want me to review and comment on the code already?
Hmm that's interesting but I guess not super suprising as we haven't done any optimization of the brand new code paths. It'd be really interesting to make separate benchmarks for each of the
That's a really interesting insight. Definitely could be worth doing; we could skip pretty much the whole |
|
Yeah, definitely fine to review. I marked as draft because it cannot be merged yet, but that is also clear from the test results :)
Well, there's not too much which is really new, but I hope something can be squeezed out.
It's commonly used for "forwarding" or "wrapping" functions where any arguments are passed verbatim to another function/method - that way the signature doesn't need to track. Not sure how often those occur in C/Rust, but since it's less instead of more code I think it's worth it. You'll see I commented out a test - this is one more question, since now |
088bc57 to
9940cdc
Compare
Good question. @sebpuetz added it, with the intention that it would give users finer control over the creation of the python function object. In reality I think Personally I'm not aware it's widely used, and if it's preventing us from writing decent optimizations then I think it's fine for us to remove it. |
davidhewitt
left a comment
There was a problem hiding this comment.
Thanks very much for working on this, this is great! I've made a few code suggestions and also pointed out a few spots where we can try to make some optimizations to close the performance gap.
I'm going to try to finish up the docs for #1622 on Sunday evening, so hopefully this PR will be easier to test and work with soon...
| } else { | ||
| impl_wrap_cfunction_with_keywords(cls, &spec, self_ty)? | ||
| }; | ||
| Ok(quote! { |
There was a problem hiding this comment.
Nice job finding this one for #[pymethods]!
There's also #[staticmethod] and #[classmethod], which I think can also support METH_FASTCALL. With a bit of refactoring it might be possible to support them in this PR.
And then there's #[call], which can also support the vectorcall protocol, but that's more complicated so probably worth punting for now and remembering in #684.
|
|
||
| // Collect varargs into tuple | ||
| let varargs = if self.accept_varargs { | ||
| Some(PyTuple::new(py, args)) |
There was a problem hiding this comment.
Yeah so this is a spot where I definitely think we can optimize PyTuple::new for the unlimited api (or even make an internal PyTuple::new_fast which does a lot less work).
There was a problem hiding this comment.
Hm, what would you optimize there?
There was a problem hiding this comment.
I see two possible inefficiencies in PyTuple::new which we could improve:
- It uses
PyTuple_SetItembut could usePyTuple_SET_ITEM, which does less error checking. I think we would also have to definePyTuple_SET_ITEMin Rust (it's a C macro), so this would inline really well. - It uses
.to_object().into_ptr()for genericT, if we wrote an implementation ofPyTuple::newwhich took&PyAnyiterator we could just useinto_ptr()which might make a teeeny tiny performance improvement (if the optimizer didn't already achieve the same effect).
I definitely think the first bullet could be worth trying. Could make a nice standalone PR rather than being part of this one.
|
Updated to latest main. The However, I could not get around the fact that the raw function's type is the concrete function |
davidhewitt
left a comment
There was a problem hiding this comment.
This is looking great to me, thanks for working on this!
I'm in favour of merging this as-is, the slowdown for the *args / **kwargs case doesn't seem so bad esp after I rebased locally on #1653 . We can always add extra optimisations for those cases like the just-args-and-kwargs-passthrough you suggest in the OP.
Couple of suggestions for tidy-up below, and I think it also could benefit with two CHANGELOG entries:
- [Changed] use
METH_FASTCALLto improve#[pyfunction]performance - [Removed]
raw_pycfunction!macro
| // _nargs is the number of positional arguments in the _args array, | ||
| // the number of KW args is given by the length of _kwnames | ||
| let _kwnames: Option<&pyo3::types::PyTuple> = #py.from_borrowed_ptr_or_opt(_kwnames); | ||
| // Safety: &PyAny has the same memory layout as `*mut ffi::PyObject` | ||
| let _args = _args as *const &pyo3::PyAny; | ||
| let _kwargs = if let Some(kwnames) = _kwnames { | ||
| std::slice::from_raw_parts(_args.offset(_nargs), kwnames.len()) | ||
| } else { | ||
| &[] | ||
| }; | ||
| let _args = std::slice::from_raw_parts(_args, _nargs as usize); |
There was a problem hiding this comment.
This boilerplate appears twice, perhaps worth refactoring into a fn fastcall_args_kwargs_boilerplate() -> TokenStream ?
There was a problem hiding this comment.
Yeah, I wanted to do this together with adding fastcall support to static and class methods. But this is a larger refactoring which I might not have time for soon.
There was a problem hiding this comment.
No worries; I find some weeks I can do lots of contributions and other weeks I can barely respond to issues. The codebase is gradually marching in the right direction :)
This is now for all not-NOARGS functions, with the change suggested to make
extract_argumentstake iterators.Unfortunately this seems to slow down the cases where
*argsor**kwargsare present: on my machine, the benchmarks showThis is even without the fact that with the old code, they could even be optimized: for
**kwargsa new dict is created, while it is possible to keep the dict with all arguments given by name, and remove those args which match named parameters.And the "only *args and **kwds" case can be heavily optimized by just passing through the tuple and dict already gotten from METH_VARARGS.