-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-14908: [C++][R] Dataset hash join segfaults on Windows #12339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
|
My best guess right now is that some memory is getting clobbered by another thread. Is there a way to execute on a single thread somehow? Tried @jonkeane Any thoughts on the threading settings? |
3de21c0 to
507050b
Compare
The main thread touches the thread_indexer before it's used by the worker thread, so we need to make sure it has capacity for both threads.
9a510cb to
a61b7a4
Compare
jonkeane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all of your work on this, I know it's been a lot. A few comments (mostly on linting)
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks for figuring this out.
Co-authored-by: Weston Pace <weston.pace@gmail.com>
|
Benchmark runs are scheduled for baseline = 5ad5ddc and contender = 7b5efe4. 7b5efe4 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Test failure on my branch: https://github.com/wjones127/arrow/runs/5068285944?check_suite_focus=true#step:18:27831 ~This only fails on Windows~. On any platform, when `USE_THREADS = FALSE` we cannot execute an exec plan that joins after scanning from a dataset. The source node executes the `InputRecieved()` callback in the IO thread, while the HashJoin wasn't anticipating another thread if executor is null (which `USE_THREADS=FALSE` sets). This PR adds one to the thread local state vector size to anticipate this case of 1 main thread + 1 IO thread. Closes apache#12339 from wjones127/ARROW-14908-windows-join-crash Authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
In #12339 we added one, which enabled joining one table to one dataset using `use_threads=false`. However, I found that joining two datasets hit the thread limit. There are plans to find [a long-term fix](https://issues.apache.org/jira/browse/ARROW-16072) that can run these operations synchronously with fewer threads, but that won't be ready for the next release. As a temporary fix for 8.0.0, I propose just bumping up the `local_states_` capacity. Closes #12845 from wjones127/ARROW-15718-multiple-datasets Authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
Test failure on my branch: https://github.com/wjones127/arrow/runs/5068285944?check_suite_focus=true#step:18:27831
This only fails on Windows. On any platform, whenUSE_THREADS = FALSEwe cannot execute an exec plan that joins after scanning from a dataset.The source node executes the
InputRecieved()callback in the IO thread, while the HashJoin wasn't anticipating another thread if executor is null (whichUSE_THREADS=FALSEsets). This PR adds one to the thread local state vector size to anticipate this case of 1 main thread + 1 IO thread.