Skip to content

Conversation

@encukou
Copy link
Member

@encukou encukou commented Aug 25, 2022

I've juggled with this long enough, time for more eyes to see it.

@encukou encukou requested a review from a team as a code owner August 25, 2022 14:25
@encukou encukou changed the title PEP 697: C-API for Extending Opaque Types PEP 697 (new): C-API for Extending Opaque Types Aug 25, 2022
@AlexWaygood AlexWaygood added the new-pep A new draft PEP submitted for initial review label Aug 25, 2022
Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a complete review, I realize I am not qualified to understand the specification.

Comment on lines +66 to +70
With the size not available, limited API users must resort to workarounds such
as querying ``__basicsize__`` and plugging it into ``PyType_Spec`` at runtime,
and divining the correct offset for their extra data.
This requires making assumptions about memory layout, which the limited API
is supposed to hide.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have examples of 3rd party code that does this? It would be useful to know how often this is a problem. I can't say I've seen even a single example, but I admit I don't review enough 3rd party extension modules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The later nanobind example does exactly this. PySide (Qt) also deals with this, but their solution is way more convoluted.
Any extension that wants to attach C-level state to custom types, and use the Limited API, will run into this.
Trouble is, these extensions tend to be big frameworks – binding generators: the big use case is exposing a foreign class mechanism (think Java, C++, ...) as Python types – if there's some run-time type information object for the foreign classes, you'll want to attach it to the Python type.
Such big binding generators often don't use the Stable ABI because of issues like what this PEP is solving. (FWIW, this issue is one of the last blockers for nanobind, though.)
On the other hand, Stable ABI is exactly what these projects should provide (as an option), to enable a “long tail” of low-maintenance projects (on PyPI and elsewhere) to not need recompiling every year.

Once this PEP is published, I'll show it to the relevant projects (that I know of) and get their opinions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I realize you buried the lede here. The key motivation is "to attach C-level state to custom types". You might explain this use case more clearly up front.

Comment on lines +82 to +84
Some types, such as CPython's ``PyHeapType``, handle this by storing
variable-sized data after the fixed-size struct.
This means that any subclass can add its own fixed-size data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by this (perhaps because I haven't used PyHeapType in a long time). Does this say that the variable-sized data is owned by the "base class" but stored after all fixed-size data (including such data belonging to "subclasses") is stored at the end? This would require computing the pointer to the variable-sized data dynamically, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes: see _PyHeapType_GET_MEMBERS for the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, more detail would be helpful.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the correct name PyHeapTypeObject here to avoid confusion.

The situation is worse in projects that target the Limited API.

For an example of the currently necessary workarounds, see:
`nb_type_data_static <https://github.com/wjakob/nanobind/blob/f3044cf44763e105428e4e0cf8f42d951b9cc997/src/nb_type.cpp#L1085>`_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't say I understand what's going on there, since it's modern C++ and I don't read that.

Could you perhaps explain what it is doing?

Are you aware of pure-C examples of similar workarounds? Or is C++ the main target here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's querying the Python attribute __basicsize__ and plugging it into PyType_Spec used to create a subclass. I'll move the English description here.

I'm using nanobind (C++) for examples & initial design discussions, because it's small and free of historical baggage. But Java or Rust bindings have similar use cases, and I'll consider the PEP a failure if JPype & Pyo3 don't find it useful.

(My own motivation is improving subclassing API in general, but that's not what people behind binding generators are asking for...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just don't expect your readers to be familiar with nanobind. It's fine to explain that nanobind provides a use case, but you still need to explain what problem it is solving and how, and how its solution is sub-optimial and how your PEP provides a better solution. I'm guessing from what you explained in comments above that when a subclass of a variable-sized type (like tuple) needs to add extra state it must define a struct to hold the extra state and write a custom function that calculates the offset of that extra state relative to the nominal (PyObject *) start of the object, and it also needs to override the new-instance allocation (tp_new? tp_alloc?) to allocate space for the extra state.

Copy link

@kumaraditya303 kumaraditya303 Aug 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C equivalent snippet of the code would be helpful. The code is small and easy to write in C.

(a spiritual successor of the popular C++ binding generator ``pybind11``).


Rationale
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right section heading here. Maybe "proposal"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rearrange this to make it clearer.

This PEP proposes a different model: instead of the superclass data being
part of the subclass data, the extra space a subclass needs is specified
and accessed separately.
(How base class data is accessed is left to whomever implements the base class:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoever

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? It would be “left to her”, not “left to she”.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's "left to (she who implements the base class)". See https://www.thoughtco.com/whoever-and-whomever-1689622#toc-how-to-use-whomever, the example "The prize should be given to whoever wins the race".

Comment on lines +117 to +118
part of the subclass data, the extra space a subclass needs is specified
and accessed separately.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this is describing. Are you proposing to use a pointer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A high-level overview. Pointer-level details are in Specification.
(I'll clarify this.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it looks like maybe it's so high level as to be meaningless (at least to someone who basically only learns from examples like myself).

they can for example provide accessor functions, expose a part of its
``struct`` for better performance, or do both.)

The proposed mechanism allows using static, read-only ``PyType_Spec``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC PyType_Spec is specific to PyHeapType. So this makes me wonder -- perhaps the PEP is just about making PyHeapType more usable? Especially from C++? And you should just say so in the PEP title, rather than claiming more generality?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's about defining types. There are two ways to do that, PyType_Spec and static types.
Maybe I should be clearer that it doesn't apply to static types?

It's not specific to C++.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured it doesn't apply to static types (3rd party static types are outmoded anyway :-). I understand it's not unique to C++ but your only example uses C++ and for a non-C++-user that example was more baffling than helpful. The excess generality that I'm objecting most to is mentioned below, "The approach generalizes to non-metaclass types as well."

Combined with a way to create class from ``PyType_Spec`` and a custom metaclass,
this will allow libraries like nanobind or JPype to create metaclasses
without making assumptions about ``PyTypeObject``'s memory layout.
The approach generalizes to non-metaclass types as well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we care?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It enables the example in the tutorial to work with stable ABI. (And in case you're following Mark Shannon, I should add that any stable ABI will need to solve the issue somehow.)

The tutorial's example – extending list with an extra C variable – is particularly impractical/simplified, sure, but it shows that currently, subclassing ties you to the ABI of the superclass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I see no particular problem to extending list, since the list instance itself is not variable-size -- it is a plain struct whose sizeof doesn't lie, and the example even works with a static type. (Perhaps that's why list was chosen in the tutorial? :-) Admittedly it doesn't work with the stable ABI (because the typedef PyListObject isn't in the ABI) but that's not the same problem you explained to me in the comments above (or I still misunderstood your explanation). The problem with defining types with extra state seems to be caused by the basic type object being variable-sized, not because it's not in the ABI. Or have I got that wrong? The equivalent at the non-metaclass level would seem to me to define a subclass of tuple that adds extra state to each tuple instance.

Although now that I read this back I'm not sure whether I've fully understood the problem that you are solving.

Copy link
Member

@CAM-Gerlach CAM-Gerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review this (yet) in any detail, but it overall looks fairly solid at least from a technical and rendering perspective. I did have a couple sidenotes about things you could take advantage of, if you wanted.

@encukou
Copy link
Member Author

encukou commented Aug 26, 2022

@gvanrossum Thank you very much for your questions!
I'm obviously too deep in the woods, and your comments should be very helpful in marking the way here.

Would you mind if the PEP was published now, so I can share the technical details with the people who are affected (or even have been asking for such a feature), while I work on the introductory bits?
[Edit: I didn't mean you should publish it]

Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
@gvanrossum
Copy link
Member

I'm happy for you to publish the PEP, major clarifications often happen after the first draft is published.

I'm also with you on the intersphinx stuff being too complex, I just use a URL that points to the current dev-level docs (or occasionally a specific version if that makes sense). I expect we'll be getting pushback from the sphinx nerds among the PEP editors team though. :-)

@AA-Turner
Copy link
Member

I'm also with you on the intersphinx stuff being too complex, I just use a URL that points to the current dev-level docs (or occasionally a specific version if that makes sense). I expect we'll be getting pushback from the sphinx nerds among the PEP editors team though. :-)

I think I qualify on the second point! I'd back using plain links in PEPs though for simplicity.

A

@encukou
Copy link
Member Author

encukou commented Aug 31, 2022

OK, merging to reserve the number, but I'll rewrite the explanations before opening the discussion.
Thanks for the reviews!

@encukou encukou merged commit e7446a1 into python:main Aug 31, 2022
@encukou encukou deleted the extend-opaque branch August 31, 2022 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new-pep A new draft PEP submitted for initial review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants