-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-43130: [C++][ArrowFlight] Crash due to UCS thread mode #43120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When mode is `UCS_THREAD_MODE_SERIALIZED`, UCX crash due to mpool corruption. This happens when buffer is deallocated on a different thread. In such case two threads access UCX memory pool simultaneously. See discussion on UCX forum: openucx/ucx#9987
|
Thanks! However, this code is scheduled for deprecation/removal soon in favor of the Disassociated IPC proposal |
|
This is not a MINOR change. See https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes for our MINOR definition. |
@lidavidm - Could you please refer me to the "Disassociated IPC" proposal?
|
|
https://arrow.apache.org/docs/dev/format/DissociatedIPC.html @zeroshade what was the timeline? |
Looks like the reference implementation for Disassociated IPC might suffer from the same issue: arrow::Result<std::unique_ptr<utils::Connection>> UcxClient::CreateConn() {
ucp_worker_params_t worker_params;
std::memset(&worker_params, 0, sizeof(worker_params));
worker_params.field_mask =
UCP_WORKER_PARAM_FIELD_THREAD_MODE | UCP_WORKER_PARAM_FIELD_FLAGS;
worker_params.thread_mode = UCS_THREAD_MODE_SERIALIZED; |
|
It was partially based on this experiment so I'm not surprised. I think for now I'm OK merging this, but please be aware it will go away in the near future. |
Sure, thanks. |
|
As per the deprecation, see this email: https://lists.apache.org/thread/g89x2y6pvlq6gyf0d1jnxfl2onsrkyt8 |
|
|
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 5a28e18. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 28 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Fix reference example based on the same fix on Arrow. See: apache/arrow#43120
Fix reference example based on the same fix on Arrow. See: apache/arrow#43120
When mode is
UCS_THREAD_MODE_SERIALIZED, UCX crash due to mpool corruption.This happens when buffer is deallocated on a different thread. In such case two threads access UCX memory pool simultaneously.
See discussion on UCX forum: openucx/ucx#9987