-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
PEP 697 (new): C-API for Extending Opaque Types #2772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
gvanrossum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a complete review, I realize I am not qualified to understand the specification.
| With the size not available, limited API users must resort to workarounds such | ||
| as querying ``__basicsize__`` and plugging it into ``PyType_Spec`` at runtime, | ||
| and divining the correct offset for their extra data. | ||
| This requires making assumptions about memory layout, which the limited API | ||
| is supposed to hide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have examples of 3rd party code that does this? It would be useful to know how often this is a problem. I can't say I've seen even a single example, but I admit I don't review enough 3rd party extension modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The later nanobind example does exactly this. PySide (Qt) also deals with this, but their solution is way more convoluted.
Any extension that wants to attach C-level state to custom types, and use the Limited API, will run into this.
Trouble is, these extensions tend to be big frameworks – binding generators: the big use case is exposing a foreign class mechanism (think Java, C++, ...) as Python types – if there's some run-time type information object for the foreign classes, you'll want to attach it to the Python type.
Such big binding generators often don't use the Stable ABI because of issues like what this PEP is solving. (FWIW, this issue is one of the last blockers for nanobind, though.)
On the other hand, Stable ABI is exactly what these projects should provide (as an option), to enable a “long tail” of low-maintenance projects (on PyPI and elsewhere) to not need recompiling every year.
Once this PEP is published, I'll show it to the relevant projects (that I know of) and get their opinions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I realize you buried the lede here. The key motivation is "to attach C-level state to custom types". You might explain this use case more clearly up front.
| Some types, such as CPython's ``PyHeapType``, handle this by storing | ||
| variable-sized data after the fixed-size struct. | ||
| This means that any subclass can add its own fixed-size data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused by this (perhaps because I haven't used PyHeapType in a long time). Does this say that the variable-sized data is owned by the "base class" but stored after all fixed-size data (including such data belonging to "subclasses") is stored at the end? This would require computing the pointer to the variable-sized data dynamically, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes: see _PyHeapType_GET_MEMBERS for the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, more detail would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the correct name PyHeapTypeObject here to avoid confusion.
| The situation is worse in projects that target the Limited API. | ||
|
|
||
| For an example of the currently necessary workarounds, see: | ||
| `nb_type_data_static <https://github.com/wjakob/nanobind/blob/f3044cf44763e105428e4e0cf8f42d951b9cc997/src/nb_type.cpp#L1085>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't say I understand what's going on there, since it's modern C++ and I don't read that.
Could you perhaps explain what it is doing?
Are you aware of pure-C examples of similar workarounds? Or is C++ the main target here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's querying the Python attribute __basicsize__ and plugging it into PyType_Spec used to create a subclass. I'll move the English description here.
I'm using nanobind (C++) for examples & initial design discussions, because it's small and free of historical baggage. But Java or Rust bindings have similar use cases, and I'll consider the PEP a failure if JPype & Pyo3 don't find it useful.
(My own motivation is improving subclassing API in general, but that's not what people behind binding generators are asking for...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just don't expect your readers to be familiar with nanobind. It's fine to explain that nanobind provides a use case, but you still need to explain what problem it is solving and how, and how its solution is sub-optimial and how your PEP provides a better solution. I'm guessing from what you explained in comments above that when a subclass of a variable-sized type (like tuple) needs to add extra state it must define a struct to hold the extra state and write a custom function that calculates the offset of that extra state relative to the nominal (PyObject *) start of the object, and it also needs to override the new-instance allocation (tp_new? tp_alloc?) to allocate space for the extra state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C equivalent snippet of the code would be helpful. The code is small and easy to write in C.
| (a spiritual successor of the popular C++ binding generator ``pybind11``). | ||
|
|
||
|
|
||
| Rationale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right section heading here. Maybe "proposal"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rearrange this to make it clearer.
| This PEP proposes a different model: instead of the superclass data being | ||
| part of the subclass data, the extra space a subclass needs is specified | ||
| and accessed separately. | ||
| (How base class data is accessed is left to whomever implements the base class: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really? It would be “left to her”, not “left to she”.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's "left to (she who implements the base class)". See https://www.thoughtco.com/whoever-and-whomever-1689622#toc-how-to-use-whomever, the example "The prize should be given to whoever wins the race".
| part of the subclass data, the extra space a subclass needs is specified | ||
| and accessed separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this is describing. Are you proposing to use a pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A high-level overview. Pointer-level details are in Specification.
(I'll clarify this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it looks like maybe it's so high level as to be meaningless (at least to someone who basically only learns from examples like myself).
| they can for example provide accessor functions, expose a part of its | ||
| ``struct`` for better performance, or do both.) | ||
|
|
||
| The proposed mechanism allows using static, read-only ``PyType_Spec`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC PyType_Spec is specific to PyHeapType. So this makes me wonder -- perhaps the PEP is just about making PyHeapType more usable? Especially from C++? And you should just say so in the PEP title, rather than claiming more generality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's about defining types. There are two ways to do that, PyType_Spec and static types.
Maybe I should be clearer that it doesn't apply to static types?
It's not specific to C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured it doesn't apply to static types (3rd party static types are outmoded anyway :-). I understand it's not unique to C++ but your only example uses C++ and for a non-C++-user that example was more baffling than helpful. The excess generality that I'm objecting most to is mentioned below, "The approach generalizes to non-metaclass types as well."
| Combined with a way to create class from ``PyType_Spec`` and a custom metaclass, | ||
| this will allow libraries like nanobind or JPype to create metaclasses | ||
| without making assumptions about ``PyTypeObject``'s memory layout. | ||
| The approach generalizes to non-metaclass types as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do we care?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It enables the example in the tutorial to work with stable ABI. (And in case you're following Mark Shannon, I should add that any stable ABI will need to solve the issue somehow.)
The tutorial's example – extending list with an extra C variable – is particularly impractical/simplified, sure, but it shows that currently, subclassing ties you to the ABI of the superclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. I see no particular problem to extending list, since the list instance itself is not variable-size -- it is a plain struct whose sizeof doesn't lie, and the example even works with a static type. (Perhaps that's why list was chosen in the tutorial? :-) Admittedly it doesn't work with the stable ABI (because the typedef PyListObject isn't in the ABI) but that's not the same problem you explained to me in the comments above (or I still misunderstood your explanation). The problem with defining types with extra state seems to be caused by the basic type object being variable-sized, not because it's not in the ABI. Or have I got that wrong? The equivalent at the non-metaclass level would seem to me to define a subclass of tuple that adds extra state to each tuple instance.
Although now that I read this back I'm not sure whether I've fully understood the problem that you are solving.
CAM-Gerlach
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review this (yet) in any detail, but it overall looks fairly solid at least from a technical and rendering perspective. I did have a couple sidenotes about things you could take advantage of, if you wanted.
|
@gvanrossum Thank you very much for your questions! Would you mind if the PEP was published now, so I can share the technical details with the people who are affected (or even have been asking for such a feature), while I work on the introductory bits? |
Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
|
I'm happy for you to publish the PEP, major clarifications often happen after the first draft is published. I'm also with you on the intersphinx stuff being too complex, I just use a URL that points to the current dev-level docs (or occasionally a specific version if that makes sense). I expect we'll be getting pushback from the sphinx nerds among the PEP editors team though. :-) |
I think I qualify on the second point! I'd back using plain links in PEPs though for simplicity. A |
|
OK, merging to reserve the number, but I'll rewrite the explanations before opening the discussion. |
I've juggled with this long enough, time for more eyes to see it.