-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-16417: [C++][Python] Segfault in test_exec_plan.py / test_joins #13036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-16417: [C++][Python] Segfault in test_exec_plan.py / test_joins #13036
Conversation
…ing in NULL for executor if use_threads is false instead of short lived thread pool. Forwading use_threads from _perform_join to execplan. Fix bug in hash_join_node.cc that could allow bits of the plan to remain running after marking the plan finished.
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tracking this down! I'm curious, how did you debug it?
re: SourceNodeOptions::FromTable - maybe we should remove this method or change it to take shared_ptr<Table>?
|
I was able to get it to reproduce in debug mode with |
|
Aha. Hmm, I wonder if it'd be useful to somehow have Pytest also assert that Arrow's thread pools are idle in between tests. (And frankly, Googletest as well.) |
|
Technically, we aren't quite there yet. The async generators in the scanner are allowed to run some cleanup after the exec plan runs. However, they strongly capture all of their state. Once #12468 merges then I think this might be a good idea. |
|
Benchmark runs are scheduled for baseline = 760ad20 and contender = 7809c6d. 7809c6d is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
This builds on top of #13035 which is also important for avoiding segmentation faults. On top of that there were a few more problems:
SourceNodeOptions::FromTablewhich is a rather dangerous method (mainly useful for unit testing) as it doesn't share ownership of the input table (even worse, it takes a const ref). Python was not keeping the table alive and it was maybe possible for the table to deleted out from under the plan (I'm not entirely sure this was causing issues but it seemed risky). I switched to TableSourceNode which shares ownership of the table (and is a bit more efficient)._perform_joinwas not passing the arg on toexecplan.use_threads=Falseit was creating a single thread executor but the current best practice is to pass in nullptr.Endto be called betweenSubmitandAddTaskwhich would allow the task to be submitted but not participate in settingfinishedon the node.