Skip to content

Conversation

@BewareMyPower
Copy link
Contributor

Fixes #11847

Motivation

There's a deadlock that might happen when Python client enables custom logging. From stack traces of #11847, we can see there're 3 threads when the Python program hanged:

  1. The thread to call ExecutorServiceProvider::close, which waits until all worker threads of ExecutorService completed by std::thread::join.
  2. The thread to use Python object for logging. It stuck at PyGILState_Ensure, which tried to acquire Python GIL. It's called in ClientConnection::handleRead. Since all pending events were cancelled by boost::asio::io_service::stop, these callbacks were completed with boost::asio::error::operation_aborted immediately.
  3. The worker thread of ExecutorService. It waited until all callbacks including ClientConnection::handleRead completed.

The root cause might be related to Python's GIL issues. It seems like CPython APIs might not work well in C++ destructors. It might be caused by some lifetime issues. But the direct cause is thread 1 was blocked by joining a worker thread.

Modifications

Detach the worker thread instead of join in ExecutorService::close to avoid potential deadlock. The close method could be called in ClientImpl's destructor, which calls shutdown. It's better to call these blocking methods in C++ destructors even without this issue.

In addition, this PR reduces the log level from error to debug for the boost::asio::error::operation_aborted error code, which means the registered event is cancelled by a close event of the event loop.

@BewareMyPower BewareMyPower self-assigned this Sep 1, 2021
@BewareMyPower BewareMyPower added component/c++ doc-not-needed Your PR changes do not impact docs release/2.8.2 type/bug The PR fixed a bug or issue reported a bug labels Sep 1, 2021
@BewareMyPower BewareMyPower force-pushed the bewaremypower/fix-py-hang branch from ed97331 to d72cb0b Compare September 1, 2021 17:09
@merlimat merlimat merged commit 43b4ff6 into apache:master Sep 2, 2021
hangc0276 pushed a commit that referenced this pull request Sep 4, 2021
@hangc0276 hangc0276 added cherry-picked/branch-2.8 Archived: 2.8 is end of life release/2.8.1 and removed release/2.8.2 labels Sep 4, 2021
@BewareMyPower BewareMyPower deleted the bewaremypower/fix-py-hang branch September 22, 2021 11:04
bharanic-dev pushed a commit to bharanic-dev/pulsar that referenced this pull request Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked/branch-2.8 Archived: 2.8 is end of life doc-not-needed Your PR changes do not impact docs release/2.8.1 type/bug The PR fixed a bug or issue reported a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[2.8.0] Python programs hang uninterruptibly on shutdown if pulsar clients are used anywhere non-global with the logger argument

4 participants