Why object creation is slow in PyO3 and how to enhance it

This is a support issue following #661, where I explain *why* PyO3’s object creation can be slow and *how* we can speed up this.
Since I joined this project a year ago, I’m not sure my understanding is correct, especially about the *historical* aspect of the design.

cc: @ijl @ehiggs

# TL; DR
- We have an overhead around object creation.
- For `PyObject` and `Py<T>`, we cannot remove this overhead.
- For current `&PyAny, &PyDict` or so, we can remove this overhead by changing them to `PyAny<'py>, PyDict<'py>`, and so on.
- But for it to come true, we need a careful design decision about how to handle owned/borrowed pointers.
- Once this would come true, we could replace many usages of `PyObject` with low-cost `PyAny<'py>` types(and then we should rename PyAny with PyObject).

## First, let’s revisit our 2 kinds of object

It is a really [confusing thing](https://github.com/PyO3/pyo3/issues/356), but we have 2 types for representing Python’s object, named `PyObject` and `PyAny` .
So, what’s the difference between them?
The answer is **PyAny is forced to use as &’py PyAny, which has the same lifetime as GILGuard, but PyObject isn’t.**
```rust
    let any: &PyAny = {
        let gil = Python::acquire_gil();
        let dict: &PyDict = PyDict::new(gil.python());
        dict.as_ref()
    };
```
Thus, this snippet causes a compile error, because `&PyAny` has the same lifetime as `GILGuard`.
CPython has a reference-count based GC, which really differs from Rust’s lifetime based memory management system. For integrating these two systems, it’s a natural choice to represent Python objects’ lifetime by GILGuard(= our RAII guard that corresponds to Python’s thread lock).

In contrast, PyObject can exist even after `GILGuard` drops.
```rust
    let obj: PyObject = {
        let gil = Python::acquire_gil();
        let dict: &PyDict = PyDict::new(gil.python());
        dict.to_object()
    };
```
This ‘not bounded’ nature of `PyObject` helps users to write complicated extensions. For example, you can send `PyObject` into another thread, though you cannot send `PyAny`.
However, `&PyAny` is a sufficient and reasonable choice for many use cases.

For other types than `PyAny` like `PyDict`, this ‘not bounded’ types are represented by `Py<T>` wrapper type.

| Bounded by GIL      | Not bounded               |
| ------------------- | ------------------------- |
| `&PyAny`              | `PyObject`                  |
| `&PyDict`, `&PyList`, … | `Py<PyDict>`, `Py<PyList>`, … |


## PyObject retrieval and object storage

Then, how we ensure that `PyObject` can be used after `GILGuard` drops?
Let’s think about the this situation.
```rust
fn object() -> PyObject {
    let gil = Python::acquire_gil();
    // do something
    make_object(gil.python())
}
{
    let obj = object();
    let gil = Python::acquire_gil();
    // do something
}
```
First, we need to ensure that `PyObject` is not deallocated after first `GILGuard` drops by incrementing
the reference count of the object when creating it.
Then we have to decrement the reference count when it drops, but here comes the problem.
Let’s recall that [values are dropped in reverse order of declaraton](https://doc.rust-lang.org/std/ops/trait.Drop.html#variables-are-dropped-in-reverse-order-of-declaration) in Rust, which means `obj` drops after `gil` drops.
```rust
{
    let obj = object();
    let gil = Python::acquire_gil();
    // do something
    implicit_drop(gil);
    implicit_drop(obj);
}
```
This is a really problematic thing, because [we cannot touch any Python objects when we have no GIL](https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock).

To prevent this behavior, we have [object storage](https://github.com/PyO3/pyo3/blob/064cd387e009136fd427c747400bf2711014f552/src/gil.rs#L176) that stores object pointers.
When `PyObject` drops, we don’t decrement its reference count but **store its internal pointer to the storage**.
Then after we release another `GILGuard`, we decrement reference counts of released `PyObject`s.

Yeah, object storage is a core overhead we discuss in this issue and ideally should be removed.
But for `PyObject`, I have no idea other than using storage to enable this complicated behavior.

# &’py Lifetime and object storage

For `&PyAny`, the situation is a bit simpler.
What we want to do for `&Py~` types is just forcing them to have the same lifetime as GILGuard.
So when we create `&Py~` types, we store its internal pointer to the object storage and returns **the reference to the pointer** in the storage.
And when `GILGuard` drops, we decrement the object’s reference count for *owned* objects and do nothing for borrowed<sup>1</sup> objects.
To enable this conditional behavior, we have 2 object storages(for owned/borrowed) for `&Py~` types.

1. This owned/borrowed does not mean Rust’s owned value and reference, but the term used in Python C-API doc. In the CPython doc, ‘borrowed’ object means we shouldn’t increment the reference count of the object. E.g., [CPython doc](https://docs.python.org/3.8/extending/extending.html?highlight=borrowed#extracting-parameters-in-extension-functions) says
> Note that any Python object references which are provided to the caller are borrowed references; do not decrement their reference count!

## How we can remove this overhead

So, yeah, for `&Py~` types what we do is just decrementing reference count or *doing nothing* when it drops.
We can do this operation without the internal storage by `Drop::drop`.
Then we would have `PyAny<’py>`, `PyDict<’py>`, and so on instead of `&PyAny`, `&PyDict` or so.
Thus this would be a really breaking change.
However, since I don’t think our current API that uses reference only for representing lifetime is not reasonable, I’m sure it is worth considering.

The problem is **how we distinguish borrowed and owned object without two types of object storage.**
A possible idea is the use of a wrapper type, say, `PyOwned<PyAny>`.
This design is clear and zero-cost but it requires lots of API changes.
Another possible idea is to use a boolean flag to represent if the object is owned or not.
```rust
struct PyAny<'py> {
    pointer: NonNull<ffi::PyObject>,
    owned: bool,
}
```
It doesn’t force users to rewrite lots of codes but needs some runtime costs.

# We need discussion and help

We appreciate any idea, discussion, and PRs around this area, especially about zero-cost object types design.
Thanks.

Bounded by GIL	Not bounded
`&PyAny`	`PyObject`
`&PyDict`, `&PyList`, …	`Py<PyDict>`, `Py<PyList>`, …

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why object creation is slow in PyO3 and how to enhance it #679

TL; DR

First, let’s revisit our 2 kinds of object

PyObject retrieval and object storage

&’py Lifetime and object storage

How we can remove this overhead

We need discussion and help

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why object creation is slow in PyO3 and how to enhance it #679

Description

TL; DR

First, let’s revisit our 2 kinds of object

PyObject retrieval and object storage

&’py Lifetime and object storage

How we can remove this overhead

We need discussion and help

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions