From ce660f0a417e9eaa24d90b3a64a26692dcaaedff Mon Sep 17 00:00:00 2001 From: Jeroen Demeyer Date: Mon, 6 May 2019 17:38:05 +0200 Subject: [PATCH 01/11] PEP 590: update --- pep-0590.rst | 269 +++++++++++++++++++++++++++++++-------------------- 1 file changed, 165 insertions(+), 104 deletions(-) diff --git a/pep-0590.rst b/pep-0590.rst index d2f7a043a84..2fa2aed4e0c 100644 --- a/pep-0590.rst +++ b/pep-0590.rst @@ -1,6 +1,6 @@ PEP: 590 -Title: Vectorcall: A new calling convention for CPython -Author: Mark Shannon +Title: Vectorcall: a fast calling protocol for CPython +Author: Mark Shannon , Jeroen Demeyer Status: Draft Type: Standards Track Content-Type: text/x-rst @@ -11,11 +11,17 @@ Post-History: Abstract ======== -This PEP introduces a new calling convention [1]_ for use by CPython and other software and tools in the CPython ecosystem. -The new calling convention is a formalisation and extension of "fastcall", a calling convention already used internally by CPython. +This PEP introduces a new C API to optimize calls of objects. +It introduces a new "vectorcall" protocol and calling convention. +This is based on the "fastcall" convention, which is already used internally by CPython. +The new features can be used by any user-defined extension class. + +**NOTE**: This PEP deals only with the Python/C API, +it does not affect the Python language or standard library. -Rationale -========= + +Motivation +========== The choice of a calling convention impacts the performance and flexibility of code on either side of the call. Often there is tension between performance and flexibility. @@ -23,22 +29,21 @@ Often there is tension between performance and flexibility. The current ``tp_call`` [2]_ calling convention is sufficiently flexible to cover all cases, but its performance is poor. The poor performance is largely a result of having to create intermediate tuples, and possibly intermediate dicts, during the call. This is mitigated in CPython by including special-case code to speed up calls to Python and builtin functions. -Unfortunately this means that other callables such as classes and third party extension objects are called using the +Unfortunately, this means that other callables such as classes and third party extension objects are called using the slower, more general ``tp_call`` calling convention. This PEP proposes that the calling convention used internally for Python and builtin functions is generalized and published so that all calls can benefit from better performance. - -Improved Performance --------------------- - -The current ``tp_call`` calling convention requires creation of a tuple and, if there are any named arguments, a dictionary for every call. -This is expensive. The proposed calling convention removes the need to create almost all of these temporary objects. -Another source of inefficiency in the ``tp_call`` convention is that it has one function pointer per-class, rather than per-object. This is inefficient for calls to classes as several intermediate objects need to be created. For a user defined class, ``UserClass``, at least one intermediate object is created for each call in the sequence ``type.__call__``, ``object.__new__``, ``UserClass.__init__``. - The new proposed calling convention is not fully general, but covers the large majority of calls. It is designed to remove the overhead of temporary object creation and multiple indirections. +Another source of inefficiency in the ``tp_call`` convention is that it has one function pointer per class, +rather than per object. +This is inefficient for calls to classes as several intermediate objects need to be created. +For a class ``cls``, at least one intermediate object is created for each call in the sequence +``type.__call__``, ``cls.__new__``, ``cls.__init__``. + + Specification ============= @@ -48,145 +53,206 @@ The function pointer type Calls are made through a function pointer taking the following parameters: * ``PyObject *callable``: The called object -* ``Py_ssize_t n``: The number of arguments plus an optional offset flag for the first argument in vector. +* ``Py_ssize_t n``: The number of arguments plus the optional flag ``PY_VECTORCALL_PREPEND`` (see below) * ``PyObject **args``: A vector of arguments -* ``PyTupleObject *kwnames``: A tuple of the names of the named arguments. +* ``PyObject *kwnames``: Either ``NULL`` or a non-empty tuple of the names of the keyword arguments This is implemented by the function pointer type: -``typedef PyObject *(*vectorcall)(PyObject *callable, Py_ssize_t n, PyObject** args, PyTupleObject *kwnames);`` +``typedef PyObject *(*vectorcallfunc)(PyObject *callable, Py_ssize_t n, PyObject **args, PyObject *kwnames);`` -Changes to the ``PyTypeObject`` -------------------------------- +Changes to the ``PyTypeObject`` struct +-------------------------------------- -The unused slot ``printfunc tp_print`` is replaced with ``tp_vectorcall_offset``. It has the type ``uintptr_t``. +The unused slot ``printfunc tp_print`` is replaced with ``tp_vectorcall_offset``. It has the type ``Py_ssize_t``. +A new ``tp_flags`` flag is added, ``Py_TPFLAGS_HAVE_VECTORCALL``, +which must be set for any class that uses the vectorcall protocol. -A new flag is added, ``Py_TPFLAGS_HAVE_VECTORCALL``, which is set for any new PyTypeObjects that use the -``tp_vectorcall_offset`` member. - -If ``Py_TPFLAGS_HAVE_VECTORCALL`` is set then ``tp_vectorcall_offset`` is the offset -into the object of the ``vectorcall`` function-pointer. -A new slot ``tp_vectorcall`` is added so that classes can support the vectorcall calling convention. -It has the type ``vectorcall``. +If ``Py_TPFLAGS_HAVE_VECTORCALL`` is set, then ``tp_vectorcall_offset`` must be a positive integer. +It is the offset into the object of the vectorcall function pointer of type ``vectorcallfunc``. +This pointer must not be ``NULL``. The ``tp_print`` slot is reused as the ``tp_vectorcall_offset`` slot to make it easier for for external projects to backport the vectorcall protocol to earlier Python versions. In particular, the Cython project has shown interest in doing that (see https://mail.python.org/pipermail/python-dev/2018-June/153927.html). +Descriptor behavior +------------------- + +One additional type flag is specified: ``Py_TPFLAGS_METHOD_DESCRIPTOR``. + +``Py_TPFLAGS_METHOD_DESCRIPTOR`` should be set if the the callable uses the descriptor protocol to create a bound method-like object. +This is used by the interpreter to avoid creating temporary objects when calling methods +(see ``_PyObject_GetMethod`` and the ``LOAD_METHOD``/``CALL_METHOD`` opcodes). -Additional flags ----------------- +Concretely, if ``Py_TPFLAGS_METHOD_DESCRIPTOR`` is set for ``type(func)``, then: -One additional flag is specified: ``Py_TPFLAGS_METHOD_DESCRIPTOR``. +- ``func.__get__(obj, cls)(*args, **kwds)`` (with ``obj`` not None) + must be equivalent to ``func(obj, *args, **kwds)``. -``Py_TPFLAGS_METHOD_DESCRIPTOR`` should be set if the the callable uses the descriptor protocol to create a method or method-like object. -This is used by the interpreter to avoid creating temporary objects when calling methods. +- ``func.__get__(None, cls)(*args, **kwds)`` must be equivalent to ``func(*args, **kwds)``. -If this flag is set for a class ``F``, then instances of that class are expected to behave the same as a Python function when used as a class attribute. -Specifically, this means that the value of ``c.m`` where ``C.m`` is an instance of the class ``F`` (and ``c`` is an instance of ``C``) -must be an object that acts like a bound method binding ``C.m`` and ``c``. -This flag is necessary if custom callables are to be able to behave like Python functions *and* be called as efficiently as Python or built-in functions. +There are no restrictions on the object ``func.__get__(obj, cls)``. +The latter is not required to implement the vectorcall protocol. The call -------- -The call takes the form ``((vectorcall)(((char *)o)+offset))(o, n, args, kwnames)`` where +The call takes the form ``((vectorcallfunc)(((char *)o)+offset))(o, n, args, kwnames)`` where ``offset`` is ``Py_TYPE(o)->tp_vectorcall_offset``. The caller is responsible for creating the ``kwnames`` tuple and ensuring that there are no duplicates in it. -``n`` is the number of postional arguments plus ``PY_VECTORCALL_ARGUMENTS_OFFSET`` if the argument vector pointer points to argument 1 in the -allocated vector and the callee is allowed to mutate the contents of the ``args`` vector. -``n = number_postional_args | (offset ? PY_VECTORCALL_ARGUMENTS_OFFSET: 0))``. +For efficiently dealing with the common case of no keywords, +``kwnames`` must be ``NULL`` if there are no keyword arguments. -PY_VECTORCALL_ARGUMENTS_OFFSET ------------------------------- +``n`` is the number of postional arguments plus possibly the ``PY_VECTORCALL_PREPEND`` flag. -When a caller sets the ``PY_VECTORCALL_ARGUMENTS_OFFSET`` it is indicating that the ``args`` pointer points to item 1 (counting from 0) of the allocated array -and that the contents of the allocated array can be safely mutated by the callee. The callee still needs to make sure that the reference counts of any objects -in the array remain correct. +PY_VECTORCALL_PREPEND +--------------------- -Example of how ``PY_VECTORCALL_ARGUMENTS_OFFSET`` is used by a callee is safely used to avoid allocation [3]_ +The flag ``PY_VECTORCALL_PREPEND`` should be added to ``n`` +if the callee is allowed to temporarily change ``args[-1]``. +In other words, this can be used if ``args`` points to argument 1 in the allocated vector. +The callee must restore the value of ``args[-1]`` before returning. -Whenever they can do so cheaply (without allocation) callers are encouraged to offset the arguments. +Whenever they can do so cheaply (without allocation), callers are encouraged to use ``PY_VECTORCALL_PREPEND``. Doing so will allow callables such as bound methods to make their onward calls cheaply. -The interpreter already allocates space on the stack for the callable, so it can offset its arguments for no additional cost. +The bytecode interpreter already allocates space on the stack for the callable, +so it can use this trick at no additional cost. -Continued prohibition of callable classes as base classes ---------------------------------------------------------- +See [3]_ for an example of how ``PY_VECTORCALL_PREPEND`` is used by a callee to avoid allocation. + +For getting the actual number of arguments from the parameter ``n``, +the macro ``PyVectorcall_NARGS(n)`` must be used. +This allows for future changes or extensions. -Currently any attempt to use ``function``, ``method`` or ``method_descriptor`` as a base class for a new class will fail with a ``TypeError``. -This behaviour is desirable as it prevents errors when a subclass overrides the ``__call__`` method. -If callables could be sub-classed then any call to a ``function`` or a ``method_descriptor`` would need an additional check that the ``__call__`` method had not been overridden. By exposing an additional call mechanism, the potential for errors becomes greater. As a consequence, any third-party class implementing the new call interface will not be usable as a base class. New C API and changes to CPython ================================ -``PyObject *PyObject_VectorCall(PyObject *obj, PyObject **args, Py_ssize_t nargs, PyTupleObject *kwnames)`` +The following functions or macros are added to the C API: -Calls ``obj`` with the given arguments. Note that ``nargs`` may include the flag ``PY_VECTORCALL_ARGUMENTS_OFFSET``. -``nargs & ~PY_VECTORCALL_ARGUMENTS_OFFSET`` is the number of positional arguments. +- ``PyObject *PyObject_Vectorcall(PyObject *obj, PyObject **args, Py_ssize_t nargs, PyObject *kwargs)``: + Calls ``obj`` with the given arguments. + Note that ``nargs`` may include the flag ``PY_VECTORCALL_PREPEND``. + The actual number of positional arguments is given by ``PyVectorcall_NARGS(nargs)``. + The argument ``kwargs`` is either a dict, a tuple of keyword names or ``NULL``. + An empty dict or empty tuple has the same effect as passing ``NULL``. + This uses either the vectorcall protocol or ``tp_call`` internally; + if neither is supported, an exception is raised. -``PyObject_VectorCall`` raises an exception if ``obj`` is not callable. +- ``PyObject *PyCall_Vectorcall(PyObject *obj, PyObject *tuple, PyObject **dict)``: + Call the object (which must support vectorcall) with the old + ``*args`` and ``**kwargs`` calling convention. + This is mostly meant to put in the ``tp_call`` slot. -Two utility functions are provided to call the new calling convention from the old one, or vice-versa. -These functions are ``PyObject *PyCall_MakeVectorCall(PyObject *obj, PyObject *tuple, PyObject **dict)`` and -``PyObject *PyCall_MakeTpCall(PyObject *obj, PyObject **args, Py_ssize_t nargs, PyTupleObject *kwnames)``, respectively. +- ``Py_ssize_t PyVectorcall_NARGS(Py_ssize nargs)``: Given a vectorcall ``nargs`` argument, + return the actual number of arguments. + Currently equivalent to ``nargs & ~PY_VECTORCALL_PREPEND``. -Both functions raise an exception if ``obj`` does not support the relevant protocol. +New ``METH_VECTORCALL`` flag +---------------------------- -``METH_FASTCALL`` and ``METH_VECTORCALL`` flags ------------------------------------------------ +A new constant ``METH_VECTORCALL`` is added for specifying ``PyMethodDef`` structs. +It means that the C function has the signature ``vectorcallfunc``. +This should be the preferred flag for new functions, as this avoids a wrapper function. -A new ``METH_VECTORCALL`` flag is added for specifying ``PyMethodDef`` structs. It is equivalent to the currently undocumented ``METH_FASTCALL | METH_KEYWORD`` flags. -The new flag specifies that the function has the type ``PyObject *(*call) (PyObject *self, PyObject *const *args, Py_ssize_t nargs, PyObject *kwname)`` +**NOTE**: the numerical value of ``METH_VECTORCALL`` is unspecified +and it may have more than 1 bit set. +It must not combined with any of the existing flags +``METH_VARARGS``, ``METH_FASTCALL``, ``METH_NOARGS``, ``METH_O`` or ``METH_KEYWORDS``. -Internal CPython changes -======================== +Subclassing +----------- -In order to conform to the specification, the only changes required are: +Extension types inherit the type flag ``Py_TPFLAGS_HAVE_VECTORCALL`` +and the value ``tp_vectorcall_offset`` from the base class, +provided that they implement ``tp_call`` the same way as the base class. +Additionally, the flag ``Py_TPFLAGS_METHOD_DESCRIPTOR`` +is inherited if ``tp_descr_get`` and ``tp_descr_set`` are implemented the +same way as the base class. -* To use the new calling convention in the interpreter. -* An implementation of the ``PyObject_VectorCall`` function. -* An implementation of the ``PyCall_MakeVectorCall`` and ``PyCall_MakeTpCall`` convenience functions. +Heap types never inherit the vectorcall protocol because +that would not be safe (heap types can be changed dynamically). +This restriction may be lifted in the future, but that would require +special-casing ``__call__`` in ``type.__setattribute__``. -To gain the promised performance advantage, the following classes will need to implement the new calling convention: -* Python functions -* Builtin functions and methods -* Bound methods -* Method descriptors -* A few of the most commonly used classes, probably ``range``, ``list``, ``str``, and ``type``. -Changes to existing C structs ------------------------------ +Internal CPython changes +======================== + +Changes to existing classes +--------------------------- -The ``function``, ``builtin_function_or_method``, ``method_descriptor`` and ``method`` classes will have their corresponding structs changed to -include a ``vectorcall`` pointer. +The ``function``, ``builtin_function_or_method``, ``method_descriptor``, ``method``, ``wrapper_descriptor``, ``method-wrapper`` +classes will use the vectorcall protocol +(not all of these will be changed in the initial implementation). -Third-party built-in classes using the new extended call interface ------------------------------------------------------------------- +For ``builtin_function_or_method`` and ``method_descriptor`` +(which use the ``PyMethodDef`` data structure), +one could implement a specific vectorcall wrapper for every existing calling convention. +Whether or not it is worth doing that remains to be seen. -To enable call performance on a par with Python functions and built-in functions, third-party callables should include a ``vectorcall`` function pointer -and set ``tp_vectorcall_offset`` to the correct value. -Any class that sets ``tp_vectorcall_offset`` to non-zero should also implement the ``tp_call`` function and make sure its behaviour is consistent with the ``vectorcall`` function. -Setting ``tp_call`` to ``PyCall_MakeVectorCall`` will suffice. +Using the vectorcall protocol for classes +----------------------------------------- + +For a class ``cls``, creating a new instance using ``cls(xxx)`` +requires multiple calls. +At least one intermediate object is created for each call in the sequence +``type.__call__``, ``cls.__new__``, ``cls.__init__``. +So it makes a lot of sense to use vectorcall for calling classes. +This really means implementing the vectorcall protocol for ``type``. +Some of the most commonly used classes will use this protocol, +probably ``range``, ``list``, ``str``, and ``type``. The ``PyMethodDef`` protocol and Argument Clinic -================================================ +------------------------------------------------ Argument Clinic [4]_ automatically generates wrapper functions around lower-level callables, providing safe unboxing of primitive types and other safety checks. -Argument Clinic could be extended to generate wrapper objects conforming to the new ``vectorcall`` protocol. +Argument Clinic could be extended to generate wrapper objects with the ``METH_VECTORCALL`` signature. This will allow execution to flow from the caller to the Argument Clinic generated wrapper and thence to the hand-written code with only a single indirection. + +Third-party extension classes using vectorcall +============================================== + +To enable call performance on a par with Python functions and built-in functions, +third-party callables should include a ``vectorcallfunc`` function pointer, +set ``tp_vectorcall_offset`` to the correct value and add the ``Py_TPFLAGS_HAVE_VECTORCALL`` flag. +Any class that does this must implement the ``tp_call`` function and make sure its behaviour is consistent with the ``vectorcallfunc`` function. +Setting ``tp_call`` to ``PyCall_Vectorcall`` is sufficient. + + Performance implications of these changes ========================================= -Initial experiments, implementing the new calling convention for Python functions, builtin functions and method-descriptors showed a -speedup of around 2%. A full implementation involving other callables and adding support for the new calling convention to argument -clinic would, in the author's estimation, yield a speedup of between 3% and 4% for the standard benchmark suite. +This PEP should not have much impact on the performance of existing code +(neither in the positive nor the negative sense). +It is mainly meant to allow efficient new code to be written, +not to make existing code faster. + +Nevertheless, this PEP optimizes for ``METH_FASTCALL`` functions. +Performance of functions using ``METH_VARARGS`` will become slightly worse. + + +Stable ABI +========== + +Nothing from this PEP is added to the stable ABI (PEP 384). Alternative Suggestions ======================= +bpo-29259 +--------- + +PEP 590 is close to what was proposed in bpo-29259 [#bpo29259]_. +The main difference is that this PEP stores the function pointer +in the instance rather than in the class. +This makes more sense for implementing functions in C, +where every instance corresponds to a different C function. +It also allows optimizing ``type.__call__``, which is not possible with bpo-29259. + PEP 576 and PEP 580 ------------------- @@ -213,30 +279,25 @@ Removing any special cases and making all calls use the ``tp_call`` form was als However, unless a much more efficient way was found to create and destroy tuples, and to a lesser extent dictionaries, then it would be too slow. + Acknowledgements ================ -Victor Stinner for developing the original "vector call" calling convention internally to CPython (where is it is called "fast call") -this PEP codifies and extends his work. +Victor Stinner for developing the original "fastcall" calling convention internally to CPython. +This PEP codifies and extends his work. -Jeroen Demeyer for authoring PEP 575 and PEP 580 which helped motivate this PEP. References ========== -.. [1] Calling conventions - https://en.wikipedia.org/wiki/Calling_convention +.. [#bpo29259] Add tp_fastcall to PyTypeObject: support FASTCALL calling convention for all callable objects, + https://bugs.python.org/issue29259 .. [2] tp_call/PyObject_Call calling convention https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_call -.. [3] Using PY_VECTORCALL_ARGUMENTS_OFFSET in callee +.. [3] Using PY_VECTORCALL_PREPEND in callee https://github.com/markshannon/cpython/blob/vectorcall-minimal/Objects/classobject.c#L53 .. [4] Argument Clinic https://docs.python.org/3/howto/clinic.html -.. [5] PEP 576 - https://www.python.org/dev/peps/pep-0576/ -.. [6] PEP 580 - https://www.python.org/dev/peps/pep-0580/ - Reference implementation From b184aa4d16b132aea76be241ba5ae350257f4123 Mon Sep 17 00:00:00 2001 From: Jeroen Demeyer Date: Wed, 8 May 2019 11:10:39 +0200 Subject: [PATCH 02/11] New PyVectorcall_FUNC() macro, allow NULL vectorcallfunc --- pep-0590.rst | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/pep-0590.rst b/pep-0590.rst index 2fa2aed4e0c..58d19f4f7c8 100644 --- a/pep-0590.rst +++ b/pep-0590.rst @@ -69,7 +69,7 @@ which must be set for any class that uses the vectorcall protocol. If ``Py_TPFLAGS_HAVE_VECTORCALL`` is set, then ``tp_vectorcall_offset`` must be a positive integer. It is the offset into the object of the vectorcall function pointer of type ``vectorcallfunc``. -This pointer must not be ``NULL``. +This pointer may be ``NULL``, in which case the behavior is the same as if ``Py_TPFLAGS_HAVE_VECTORCALL`` was not set. The ``tp_print`` slot is reused as the ``tp_vectorcall_offset`` slot to make it easier for for external projects to backport the vectorcall protocol to earlier Python versions. In particular, the Cython project has shown interest in doing that (see https://mail.python.org/pipermail/python-dev/2018-June/153927.html). @@ -143,6 +143,10 @@ The following functions or macros are added to the C API: ``*args`` and ``**kwargs`` calling convention. This is mostly meant to put in the ``tp_call`` slot. +- ``vectorcallfunc PyVectorcall_FUNC(PyObject *obj)``: If ``obj`` does not support vectorcall, + return ``NULL``. + Otherwise, return the vectorcall pointer in the instance ``obj`` (which may be ``NULL``). + - ``Py_ssize_t PyVectorcall_NARGS(Py_ssize nargs)``: Given a vectorcall ``nargs`` argument, return the actual number of arguments. Currently equivalent to ``nargs & ~PY_VECTORCALL_PREPEND``. From c4657aad29940143f70dc116a568ea7dc045c1d5 Mon Sep 17 00:00:00 2001 From: Jeroen Demeyer Date: Wed, 8 May 2019 11:46:26 +0200 Subject: [PATCH 03/11] Use PyObject *const *args for argument vector --- pep-0590.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/pep-0590.rst b/pep-0590.rst index 58d19f4f7c8..70d0386a838 100644 --- a/pep-0590.rst +++ b/pep-0590.rst @@ -54,11 +54,11 @@ Calls are made through a function pointer taking the following parameters: * ``PyObject *callable``: The called object * ``Py_ssize_t n``: The number of arguments plus the optional flag ``PY_VECTORCALL_PREPEND`` (see below) -* ``PyObject **args``: A vector of arguments +* ``PyObject *const *args``: A vector of arguments * ``PyObject *kwnames``: Either ``NULL`` or a non-empty tuple of the names of the keyword arguments This is implemented by the function pointer type: -``typedef PyObject *(*vectorcallfunc)(PyObject *callable, Py_ssize_t n, PyObject **args, PyObject *kwnames);`` +``typedef PyObject *(*vectorcallfunc)(PyObject *callable, Py_ssize_t n, PyObject *const *args, PyObject *kwnames);`` Changes to the ``PyTypeObject`` struct -------------------------------------- @@ -129,7 +129,7 @@ New C API and changes to CPython The following functions or macros are added to the C API: -- ``PyObject *PyObject_Vectorcall(PyObject *obj, PyObject **args, Py_ssize_t nargs, PyObject *kwargs)``: +- ``PyObject *PyObject_Vectorcall(PyObject *obj, PyObject *const *args, Py_ssize_t nargs, PyObject *kwargs)``: Calls ``obj`` with the given arguments. Note that ``nargs`` may include the flag ``PY_VECTORCALL_PREPEND``. The actual number of positional arguments is given by ``PyVectorcall_NARGS(nargs)``. @@ -138,7 +138,7 @@ The following functions or macros are added to the C API: This uses either the vectorcall protocol or ``tp_call`` internally; if neither is supported, an exception is raised. -- ``PyObject *PyCall_Vectorcall(PyObject *obj, PyObject *tuple, PyObject **dict)``: +- ``PyObject *PyCall_Vectorcall(PyObject *obj, PyObject *tuple, PyObject *dict)``: Call the object (which must support vectorcall) with the old ``*args`` and ``**kwargs`` calling convention. This is mostly meant to put in the ``tp_call`` slot. From 829bf6a40599715e54f7d2b7ae16e6d8eff95bcd Mon Sep 17 00:00:00 2001 From: Jeroen Demeyer Date: Wed, 8 May 2019 12:01:25 +0200 Subject: [PATCH 04/11] Use "keywords" instead of "kwargs" in PyObject_Vectorcall --- pep-0590.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pep-0590.rst b/pep-0590.rst index 70d0386a838..66092060a0a 100644 --- a/pep-0590.rst +++ b/pep-0590.rst @@ -129,11 +129,11 @@ New C API and changes to CPython The following functions or macros are added to the C API: -- ``PyObject *PyObject_Vectorcall(PyObject *obj, PyObject *const *args, Py_ssize_t nargs, PyObject *kwargs)``: +- ``PyObject *PyObject_Vectorcall(PyObject *obj, PyObject *const *args, Py_ssize_t nargs, PyObject *keywords)``: Calls ``obj`` with the given arguments. Note that ``nargs`` may include the flag ``PY_VECTORCALL_PREPEND``. The actual number of positional arguments is given by ``PyVectorcall_NARGS(nargs)``. - The argument ``kwargs`` is either a dict, a tuple of keyword names or ``NULL``. + The argument ``keywords`` is either a dict, a tuple of keyword names or ``NULL``. An empty dict or empty tuple has the same effect as passing ``NULL``. This uses either the vectorcall protocol or ``tp_call`` internally; if neither is supported, an exception is raised. From 9190dd494dc140a1518464353af0a3296d398931 Mon Sep 17 00:00:00 2001 From: Jeroen Demeyer Date: Wed, 8 May 2019 17:13:45 +0200 Subject: [PATCH 05/11] Rename PyCall_Vectorcall -> PyVectorcall_Call --- pep-0590.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pep-0590.rst b/pep-0590.rst index 66092060a0a..42245d253b4 100644 --- a/pep-0590.rst +++ b/pep-0590.rst @@ -138,7 +138,7 @@ The following functions or macros are added to the C API: This uses either the vectorcall protocol or ``tp_call`` internally; if neither is supported, an exception is raised. -- ``PyObject *PyCall_Vectorcall(PyObject *obj, PyObject *tuple, PyObject *dict)``: +- ``PyObject *PyVectorcall_Call(PyObject *obj, PyObject *tuple, PyObject *dict)``: Call the object (which must support vectorcall) with the old ``*args`` and ``**kwargs`` calling convention. This is mostly meant to put in the ``tp_call`` slot. @@ -223,7 +223,7 @@ To enable call performance on a par with Python functions and built-in functions third-party callables should include a ``vectorcallfunc`` function pointer, set ``tp_vectorcall_offset`` to the correct value and add the ``Py_TPFLAGS_HAVE_VECTORCALL`` flag. Any class that does this must implement the ``tp_call`` function and make sure its behaviour is consistent with the ``vectorcallfunc`` function. -Setting ``tp_call`` to ``PyCall_Vectorcall`` is sufficient. +Setting ``tp_call`` to ``PyVectorcall_Call`` is sufficient. Performance implications of these changes From 0e025fa1ca4fc0ec512eed23c96540c2c9c751f3 Mon Sep 17 00:00:00 2001 From: Jeroen Demeyer Date: Wed, 8 May 2019 17:15:14 +0200 Subject: [PATCH 06/11] Rename PyVectorcall_FUNC -> PyVectorcall_Function --- pep-0590.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0590.rst b/pep-0590.rst index 42245d253b4..b8048536639 100644 --- a/pep-0590.rst +++ b/pep-0590.rst @@ -143,7 +143,7 @@ The following functions or macros are added to the C API: ``*args`` and ``**kwargs`` calling convention. This is mostly meant to put in the ``tp_call`` slot. -- ``vectorcallfunc PyVectorcall_FUNC(PyObject *obj)``: If ``obj`` does not support vectorcall, +- ``vectorcallfunc PyVectorcall_Function(PyObject *obj)``: If ``obj`` does not support vectorcall, return ``NULL``. Otherwise, return the vectorcall pointer in the instance ``obj`` (which may be ``NULL``). From 80dfe701ae562077bea63dfded8e7f481ceb7a6a Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Wed, 8 May 2019 12:50:39 -0400 Subject: [PATCH 07/11] Revert controversial changes --- pep-0590.rst | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/pep-0590.rst b/pep-0590.rst index b8048536639..ad66d9e8dad 100644 --- a/pep-0590.rst +++ b/pep-0590.rst @@ -55,7 +55,7 @@ Calls are made through a function pointer taking the following parameters: * ``PyObject *callable``: The called object * ``Py_ssize_t n``: The number of arguments plus the optional flag ``PY_VECTORCALL_PREPEND`` (see below) * ``PyObject *const *args``: A vector of arguments -* ``PyObject *kwnames``: Either ``NULL`` or a non-empty tuple of the names of the keyword arguments +* ``PyTupleObject *kwnames``: A tuple of the names of the named arguments. This is implemented by the function pointer type: ``typedef PyObject *(*vectorcallfunc)(PyObject *callable, Py_ssize_t n, PyObject *const *args, PyObject *kwnames);`` @@ -63,7 +63,7 @@ This is implemented by the function pointer type: Changes to the ``PyTypeObject`` struct -------------------------------------- -The unused slot ``printfunc tp_print`` is replaced with ``tp_vectorcall_offset``. It has the type ``Py_ssize_t``. +The unused slot ``printfunc tp_print`` is replaced with ``tp_vectorcall_offset``. It has the type ``uint32_t``. A new ``tp_flags`` flag is added, ``Py_TPFLAGS_HAVE_VECTORCALL``, which must be set for any class that uses the vectorcall protocol. @@ -99,8 +99,6 @@ The call The call takes the form ``((vectorcallfunc)(((char *)o)+offset))(o, n, args, kwnames)`` where ``offset`` is ``Py_TYPE(o)->tp_vectorcall_offset``. The caller is responsible for creating the ``kwnames`` tuple and ensuring that there are no duplicates in it. -For efficiently dealing with the common case of no keywords, -``kwnames`` must be ``NULL`` if there are no keyword arguments. ``n`` is the number of postional arguments plus possibly the ``PY_VECTORCALL_PREPEND`` flag. @@ -138,15 +136,11 @@ The following functions or macros are added to the C API: This uses either the vectorcall protocol or ``tp_call`` internally; if neither is supported, an exception is raised. -- ``PyObject *PyVectorcall_Call(PyObject *obj, PyObject *tuple, PyObject *dict)``: +- ``PyObject *PyCall_MakeVectorCall(PyObject *obj, PyObject *tuple, PyObject *dict)``: Call the object (which must support vectorcall) with the old ``*args`` and ``**kwargs`` calling convention. This is mostly meant to put in the ``tp_call`` slot. -- ``vectorcallfunc PyVectorcall_Function(PyObject *obj)``: If ``obj`` does not support vectorcall, - return ``NULL``. - Otherwise, return the vectorcall pointer in the instance ``obj`` (which may be ``NULL``). - - ``Py_ssize_t PyVectorcall_NARGS(Py_ssize nargs)``: Given a vectorcall ``nargs`` argument, return the actual number of arguments. Currently equivalent to ``nargs & ~PY_VECTORCALL_PREPEND``. @@ -155,7 +149,7 @@ New ``METH_VECTORCALL`` flag ---------------------------- A new constant ``METH_VECTORCALL`` is added for specifying ``PyMethodDef`` structs. -It means that the C function has the signature ``vectorcallfunc``. +It means that the C function has the type ``PyObject *(*call) (PyObject *self, PyObject *const *args, Py_ssize_t nargs, PyObject *kwname)``. This should be the preferred flag for new functions, as this avoids a wrapper function. **NOTE**: the numerical value of ``METH_VECTORCALL`` is unspecified @@ -211,7 +205,7 @@ The ``PyMethodDef`` protocol and Argument Clinic Argument Clinic [4]_ automatically generates wrapper functions around lower-level callables, providing safe unboxing of primitive types and other safety checks. -Argument Clinic could be extended to generate wrapper objects with the ``METH_VECTORCALL`` signature. +Argument Clinic could be extended to generate wrapper objects conforming to the new ``vectorcall`` protocol. This will allow execution to flow from the caller to the Argument Clinic generated wrapper and thence to the hand-written code with only a single indirection. @@ -223,7 +217,7 @@ To enable call performance on a par with Python functions and built-in functions third-party callables should include a ``vectorcallfunc`` function pointer, set ``tp_vectorcall_offset`` to the correct value and add the ``Py_TPFLAGS_HAVE_VECTORCALL`` flag. Any class that does this must implement the ``tp_call`` function and make sure its behaviour is consistent with the ``vectorcallfunc`` function. -Setting ``tp_call`` to ``PyVectorcall_Call`` is sufficient. +Setting ``tp_call`` to ``PyCall_MakeVectorCall`` is sufficient. Performance implications of these changes From 49313d0c217ea89162c1bf43fd5eabbe0bf3fb04 Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Wed, 8 May 2019 13:18:58 -0400 Subject: [PATCH 08/11] Revert to the PY_VECTORCALL_ARGUMENTS_OFFSET name for now --- pep-0590.rst | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/pep-0590.rst b/pep-0590.rst index ad66d9e8dad..66140965f14 100644 --- a/pep-0590.rst +++ b/pep-0590.rst @@ -53,7 +53,7 @@ The function pointer type Calls are made through a function pointer taking the following parameters: * ``PyObject *callable``: The called object -* ``Py_ssize_t n``: The number of arguments plus the optional flag ``PY_VECTORCALL_PREPEND`` (see below) +* ``Py_ssize_t n``: The number of arguments plus the optional flag ``PY_VECTORCALL_ARGUMENTS_OFFSET`` (see below) * ``PyObject *const *args``: A vector of arguments * ``PyTupleObject *kwnames``: A tuple of the names of the named arguments. @@ -100,22 +100,22 @@ The call takes the form ``((vectorcallfunc)(((char *)o)+offset))(o, n, args, kwn ``offset`` is ``Py_TYPE(o)->tp_vectorcall_offset``. The caller is responsible for creating the ``kwnames`` tuple and ensuring that there are no duplicates in it. -``n`` is the number of postional arguments plus possibly the ``PY_VECTORCALL_PREPEND`` flag. +``n`` is the number of postional arguments plus possibly the ``PY_VECTORCALL_ARGUMENTS_OFFSET`` flag. -PY_VECTORCALL_PREPEND +PY_VECTORCALL_ARGUMENTS_OFFSET --------------------- -The flag ``PY_VECTORCALL_PREPEND`` should be added to ``n`` +The flag ``PY_VECTORCALL_ARGUMENTS_OFFSET`` should be added to ``n`` if the callee is allowed to temporarily change ``args[-1]``. In other words, this can be used if ``args`` points to argument 1 in the allocated vector. The callee must restore the value of ``args[-1]`` before returning. -Whenever they can do so cheaply (without allocation), callers are encouraged to use ``PY_VECTORCALL_PREPEND``. +Whenever they can do so cheaply (without allocation), callers are encouraged to use ``PY_VECTORCALL_ARGUMENTS_OFFSET``. Doing so will allow callables such as bound methods to make their onward calls cheaply. The bytecode interpreter already allocates space on the stack for the callable, so it can use this trick at no additional cost. -See [3]_ for an example of how ``PY_VECTORCALL_PREPEND`` is used by a callee to avoid allocation. +See [3]_ for an example of how ``PY_VECTORCALL_ARGUMENTS_OFFSET`` is used by a callee to avoid allocation. For getting the actual number of arguments from the parameter ``n``, the macro ``PyVectorcall_NARGS(n)`` must be used. @@ -129,7 +129,7 @@ The following functions or macros are added to the C API: - ``PyObject *PyObject_Vectorcall(PyObject *obj, PyObject *const *args, Py_ssize_t nargs, PyObject *keywords)``: Calls ``obj`` with the given arguments. - Note that ``nargs`` may include the flag ``PY_VECTORCALL_PREPEND``. + Note that ``nargs`` may include the flag ``PY_VECTORCALL_ARGUMENTS_OFFSET``. The actual number of positional arguments is given by ``PyVectorcall_NARGS(nargs)``. The argument ``keywords`` is either a dict, a tuple of keyword names or ``NULL``. An empty dict or empty tuple has the same effect as passing ``NULL``. @@ -143,7 +143,7 @@ The following functions or macros are added to the C API: - ``Py_ssize_t PyVectorcall_NARGS(Py_ssize nargs)``: Given a vectorcall ``nargs`` argument, return the actual number of arguments. - Currently equivalent to ``nargs & ~PY_VECTORCALL_PREPEND``. + Currently equivalent to ``nargs & ~PY_VECTORCALL_ARGUMENTS_OFFSET``. New ``METH_VECTORCALL`` flag ---------------------------- @@ -292,7 +292,7 @@ References https://bugs.python.org/issue29259 .. [2] tp_call/PyObject_Call calling convention https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_call -.. [3] Using PY_VECTORCALL_PREPEND in callee +.. [3] Using PY_VECTORCALL_ARGUMENTS_OFFSET in callee https://github.com/markshannon/cpython/blob/vectorcall-minimal/Objects/classobject.c#L53 .. [4] Argument Clinic https://docs.python.org/3/howto/clinic.html From b777f0f942cf6b1a697f7d66e86f1e91bce9a92b Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Wed, 8 May 2019 13:42:44 -0400 Subject: [PATCH 09/11] Remove possibility of passing dict to PyObject_Vectorcall for now --- pep-0590.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pep-0590.rst b/pep-0590.rst index 66140965f14..64fdebd7286 100644 --- a/pep-0590.rst +++ b/pep-0590.rst @@ -131,8 +131,8 @@ The following functions or macros are added to the C API: Calls ``obj`` with the given arguments. Note that ``nargs`` may include the flag ``PY_VECTORCALL_ARGUMENTS_OFFSET``. The actual number of positional arguments is given by ``PyVectorcall_NARGS(nargs)``. - The argument ``keywords`` is either a dict, a tuple of keyword names or ``NULL``. - An empty dict or empty tuple has the same effect as passing ``NULL``. + The argument ``keywords`` is a tuple of keyword names or ``NULL``. + An empty tuple has the same effect as passing ``NULL``. This uses either the vectorcall protocol or ``tp_call`` internally; if neither is supported, an exception is raised. From 6665163f80697c6e91e49dded365e77ff442e731 Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Wed, 8 May 2019 13:44:31 -0400 Subject: [PATCH 10/11] Fix ReST syntax --- pep-0590.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0590.rst b/pep-0590.rst index 64fdebd7286..5efffb58fda 100644 --- a/pep-0590.rst +++ b/pep-0590.rst @@ -103,7 +103,7 @@ The caller is responsible for creating the ``kwnames`` tuple and ensuring that t ``n`` is the number of postional arguments plus possibly the ``PY_VECTORCALL_ARGUMENTS_OFFSET`` flag. PY_VECTORCALL_ARGUMENTS_OFFSET ---------------------- +------------------------------ The flag ``PY_VECTORCALL_ARGUMENTS_OFFSET`` should be added to ``n`` if the callee is allowed to temporarily change ``args[-1]``. From b6be9ec2f97e7be31604c0b69912b7c054f8e5af Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Wed, 8 May 2019 14:04:24 -0400 Subject: [PATCH 11/11] Fix type name --- pep-0590.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0590.rst b/pep-0590.rst index 5efffb58fda..bc654f0c164 100644 --- a/pep-0590.rst +++ b/pep-0590.rst @@ -63,7 +63,7 @@ This is implemented by the function pointer type: Changes to the ``PyTypeObject`` struct -------------------------------------- -The unused slot ``printfunc tp_print`` is replaced with ``tp_vectorcall_offset``. It has the type ``uint32_t``. +The unused slot ``printfunc tp_print`` is replaced with ``tp_vectorcall_offset``. It has the type ``Py_ssize_t``. A new ``tp_flags`` flag is added, ``Py_TPFLAGS_HAVE_VECTORCALL``, which must be set for any class that uses the vectorcall protocol.