Skip to content

Program aborts when Python's garbage collector gets called from another thread and attempts to traverse an unsendable pyclass instance. #3688

@JRRudy1

Description

@JRRudy1

I have created a repository providing a full breakdown and minimal reproducible example of the error
at https://github.com/JRRudy1/pyo3_gc_error. I will provide a summary below, but please check out
the repository instead as I put a lot of effort into clearly presenting and investigating the issue.

In summary, I have discovered an error, or perhaps an undocumented limitation, in the way
PyO3 handles thread-checking for "unsendable" pyclass instances as they are being traversed
by Python's garbage collector (GC). In particular, this occurs when garbage collection is triggered
from a separate thread, and the pyclasses integrate with the GC by implementing the __traverse__
magic method. The error (or limitation) results in a hard abort, and is particularly problematic
since it cannot be caught from Python using a try/except block.

The conditions and sequence of events leading to the error can be summarized as:

  1. Two (or more) instances of an "unsendable" pyclass are created from Python
  2. The objects are in a reference cycle, and the pyclass defines __traverse__/__clear__ to break it
  3. All references to them outside the cycle are dropped, so the next GC cycle should clean them up
  4. Before the GC runs automatically, it gets explicitly called from another thread (gc.collect from
    Python or GcCollect from C)
  5. When the GC calls back into Rust to traverse the objects, PyO3 detects that the calling thread is
    not the original thread and incorrectly deduces that the object was sent between threads
  6. PyO3 triggers a panic and the program aborts with a misleading error message

I have gotten reasonably familiar with PyO3's internals and may be interested in working on this,
but I would need some guidance from an "expert" with a more nuanced understanding of the
possible implications. It is possible that the limitation cannot be safely fixed, and the only solution
is to improve the error message and add a warning to the documentation.

As mentioned above, please visit https://github.com/JRRudy1/pyo3_gc_error for more information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions