-
Notifications
You must be signed in to change notification settings - Fork 4.1k
ARROW-15593: [C++] Make after-fork ThreadPool reinitialization thread-safe #12358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,6 +28,7 @@ | |
|
|
||
| #include "arrow/util/io_util.h" | ||
| #include "arrow/util/logging.h" | ||
| #include "arrow/util/mutex.h" | ||
|
|
||
| namespace arrow { | ||
| namespace internal { | ||
|
|
@@ -235,24 +236,28 @@ ThreadPool::~ThreadPool() { | |
| void ThreadPool::ProtectAgainstFork() { | ||
| #ifndef _WIN32 | ||
| pid_t current_pid = getpid(); | ||
| if (pid_ != current_pid) { | ||
| // Reinitialize internal state in child process after fork() | ||
| // Ideally we would use pthread_at_fork(), but that doesn't allow | ||
| // storing an argument, hence we'd need to maintain a list of all | ||
| // existing ThreadPools. | ||
| int capacity = state_->desired_capacity_; | ||
|
|
||
| auto new_state = std::make_shared<ThreadPool::State>(); | ||
| new_state->please_shutdown_ = state_->please_shutdown_; | ||
| new_state->quick_shutdown_ = state_->quick_shutdown_; | ||
|
|
||
| pid_ = current_pid; | ||
| sp_state_ = new_state; | ||
| state_ = sp_state_.get(); | ||
|
|
||
| // Launch worker threads anew | ||
| if (!state_->please_shutdown_) { | ||
| ARROW_UNUSED(SetCapacity(capacity)); | ||
| if (pid_.load() != current_pid) { | ||
| // Reinitialize internal state in child process after fork(). | ||
| { | ||
| // Since after-fork reinitialization is triggered when one of the ThreadPool | ||
| // methods is called, it can be very well be called from multiple threads | ||
| // at once. Therefore, it needs to be guarded with a lock. | ||
| auto lock = util::GlobalForkSafeMutex()->Lock(); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just some random thoughts for comment.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You need a |
||
|
|
||
| if (pid_.load() != current_pid) { | ||
| int capacity = state_->desired_capacity_; | ||
|
|
||
| auto new_state = std::make_shared<ThreadPool::State>(); | ||
| new_state->please_shutdown_ = state_->please_shutdown_; | ||
| new_state->quick_shutdown_ = state_->quick_shutdown_; | ||
|
|
||
| sp_state_ = new_state; | ||
| state_ = sp_state_.get(); | ||
| pid_ = current_pid; | ||
|
|
||
| // Launch worker threads anew | ||
| ARROW_UNUSED(SetCapacity(capacity)); | ||
| } | ||
| } | ||
| } | ||
| #endif | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,6 +21,10 @@ | |
| #include <unistd.h> | ||
| #endif | ||
|
|
||
| #ifndef _WIN32 | ||
| #include <atomic> | ||
| #endif | ||
|
|
||
| #include <cstdint> | ||
| #include <memory> | ||
| #include <queue> | ||
|
|
@@ -373,7 +377,7 @@ class ARROW_EXPORT ThreadPool : public Executor { | |
| State* state_; | ||
| bool shutdown_on_destroy_; | ||
| #ifndef _WIN32 | ||
| pid_t pid_; | ||
| std::atomic<pid_t> pid_; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But the first load (line 239) may have data race with the store (line 256)?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That should be ok because we have the double check on line 247 which happens inside the mutex.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed the code should be safe in practice.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, an atomic is required at least for theoretical correctness and to avoid (potentially) tripping up TSAN.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to make this discussion more complete. If we only care about atomicity, to satisfy TSAN without introducing unnecessary memory barriers, we can use relaxed memory ordering when accessing the atomic variable, not the default sequential consistent memory ordering. The assembly code will probably be the same as a normal variable on most architectures. This might help performance on cpus with more relaxed memory models than x86 (e.g., arm, ppc) in some cases, but the improvement is often negligible in a real world application.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for the details @cyb70289 . I appreciate you both taking the time to help explain this. This code shouldn't be on the data path so I agree the extra performance is not worth false positives in TSAN or the extra complexity of relaxed memory order. |
||
| #endif | ||
| }; | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe it's not trivial when working with multiprocessing in Python. Why not use a
unique_ptrhere?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't a lot of control over shutdown ordering at the moment. So it's possible the main thread starts tearing down and deleting any unique_ptr here before the child threads have fully shutdown. Then the child threads might still try and access this mutex which would lead to accessing freed memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a particular use case in mind? For example, the most common multiprocessing pattern is "fork-join" where you fork off a number of child processes for some task and then join them all back together when the task is done. In that case the leak would be limited because the memory would be free when the child process ended.
The only way to really grow this leak is to recursively and continuously fork. Even then you would run out of process IDs far before any significant memory was lost. I suppose you could be continuously killing the parent off after the fork and leaving an orphan but I don't know of any real world use case for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks.
My intuition is that one may create and destroy the
multiprocessing.Poolcontinuously, which may result in too many leaks here.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be ok. For example, imagine your process is on pid 100. You create
multiprocessing.Poolwith size 10. It will create processes 101, 102, ..., 110. Each of those child processes will be just a tiny bit too large (~40 bytes). Process 100 will still be the correct size. When you destroy the pool the processes 101-110 go away. Process 100 remains and is still the correct size. When you create a new pool the whole thing repeats with 10 new process IDs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for the record, no process will be larger because of this leak. Perhaps I should have put "leak" in quotes, because regardless of using
unique_ptror not, the mutex will be destroyed at process exit. The difference is mainly in code hygiene, and also some sanitizers may report it as an actual leak (memory that wasn't deallocated before process exit).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. You are right. thanks.