Skip to content

Why object creation is slow in PyO3 and how to enhance it #679

@kngwyu

Description

@kngwyu

This is a support issue following #661, where I explain why PyO3’s object creation can be slow and how we can speed up this.
Since I joined this project a year ago, I’m not sure my understanding is correct, especially about the historical aspect of the design.

cc: @ijl @ehiggs

TL; DR

  • We have an overhead around object creation.
  • For PyObject and Py<T>, we cannot remove this overhead.
  • For current &PyAny, &PyDict or so, we can remove this overhead by changing them to PyAny<'py>, PyDict<'py>, and so on.
  • But for it to come true, we need a careful design decision about how to handle owned/borrowed pointers.
  • Once this would come true, we could replace many usages of PyObject with low-cost PyAny<'py> types(and then we should rename PyAny with PyObject).

First, let’s revisit our 2 kinds of object

It is a really confusing thing, but we have 2 types for representing Python’s object, named PyObject and PyAny .
So, what’s the difference between them?
The answer is PyAny is forced to use as &’py PyAny, which has the same lifetime as GILGuard, but PyObject isn’t.

    let any: &PyAny = {
        let gil = Python::acquire_gil();
        let dict: &PyDict = PyDict::new(gil.python());
        dict.as_ref()
    };

Thus, this snippet causes a compile error, because &PyAny has the same lifetime as GILGuard.
CPython has a reference-count based GC, which really differs from Rust’s lifetime based memory management system. For integrating these two systems, it’s a natural choice to represent Python objects’ lifetime by GILGuard(= our RAII guard that corresponds to Python’s thread lock).

In contrast, PyObject can exist even after GILGuard drops.

    let obj: PyObject = {
        let gil = Python::acquire_gil();
        let dict: &PyDict = PyDict::new(gil.python());
        dict.to_object()
    };

This ‘not bounded’ nature of PyObject helps users to write complicated extensions. For example, you can send PyObject into another thread, though you cannot send PyAny.
However, &PyAny is a sufficient and reasonable choice for many use cases.

For other types than PyAny like PyDict, this ‘not bounded’ types are represented by Py<T> wrapper type.

Bounded by GIL Not bounded
&PyAny PyObject
&PyDict, &PyList, … Py<PyDict>, Py<PyList>, …

PyObject retrieval and object storage

Then, how we ensure that PyObject can be used after GILGuard drops?
Let’s think about the this situation.

fn object() -> PyObject {
    let gil = Python::acquire_gil();
    // do something
    make_object(gil.python())
}
{
    let obj = object();
    let gil = Python::acquire_gil();
    // do something
}

First, we need to ensure that PyObject is not deallocated after first GILGuard drops by incrementing
the reference count of the object when creating it.
Then we have to decrement the reference count when it drops, but here comes the problem.
Let’s recall that values are dropped in reverse order of declaraton in Rust, which means obj drops after gil drops.

{
    let obj = object();
    let gil = Python::acquire_gil();
    // do something
    implicit_drop(gil);
    implicit_drop(obj);
}

This is a really problematic thing, because we cannot touch any Python objects when we have no GIL.

To prevent this behavior, we have object storage that stores object pointers.
When PyObject drops, we don’t decrement its reference count but store its internal pointer to the storage.
Then after we release another GILGuard, we decrement reference counts of released PyObjects.

Yeah, object storage is a core overhead we discuss in this issue and ideally should be removed.
But for PyObject, I have no idea other than using storage to enable this complicated behavior.

&’py Lifetime and object storage

For &PyAny, the situation is a bit simpler.
What we want to do for &Py~ types is just forcing them to have the same lifetime as GILGuard.
So when we create &Py~ types, we store its internal pointer to the object storage and returns the reference to the pointer in the storage.
And when GILGuard drops, we decrement the object’s reference count for owned objects and do nothing for borrowed1 objects.
To enable this conditional behavior, we have 2 object storages(for owned/borrowed) for &Py~ types.

  1. This owned/borrowed does not mean Rust’s owned value and reference, but the term used in Python C-API doc. In the CPython doc, ‘borrowed’ object means we shouldn’t increment the reference count of the object. E.g., CPython doc says

Note that any Python object references which are provided to the caller are borrowed references; do not decrement their reference count!

How we can remove this overhead

So, yeah, for &Py~ types what we do is just decrementing reference count or doing nothing when it drops.
We can do this operation without the internal storage by Drop::drop.
Then we would have PyAny<’py>, PyDict<’py>, and so on instead of &PyAny, &PyDict or so.
Thus this would be a really breaking change.
However, since I don’t think our current API that uses reference only for representing lifetime is not reasonable, I’m sure it is worth considering.

The problem is how we distinguish borrowed and owned object without two types of object storage.
A possible idea is the use of a wrapper type, say, PyOwned<PyAny>.
This design is clear and zero-cost but it requires lots of API changes.
Another possible idea is to use a boolean flag to represent if the object is owned or not.

struct PyAny<'py> {
    pointer: NonNull<ffi::PyObject>,
    owned: bool,
}

It doesn’t force users to rewrite lots of codes but needs some runtime costs.

We need discussion and help

We appreciate any idea, discussion, and PRs around this area, especially about zero-cost object types design.
Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions