-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-15732: [C++] Do not use any CPU threads in execution plan when use_threads is false #12468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
c4d7f07 to
2978c1b
Compare
a7af71c to
4fd18dc
Compare
4fd18dc to
e3f588f
Compare
… item only after all outstanding futures have completed Unfortunately, this seems to have made the merge generator, which was already quite complicated, even more complicated. I'd welcome any suggestions for simplification. In the meantime, even though this one generator is more complicated, I think this allows us to simplify code using async generators considerably. This is a prerequisite for #12468 because there is no way to keep the serial generator alive after the async generator has been destroyed (we can't use shared_ptr in this case) Closes #12662 from westonpace/feature/ARROW-15968--only-emit-terminal-items-when-outstanding-tasks-finished Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>
e3f588f to
c4464dd
Compare
|
@westonpace What's the status on this PR? |
|
@pitrou Still blocked by a cancelable scanner (which is in the works). I'll close this until ready. |
|
@westonpace is it correct to assume that |
I think I'd be more in favor of just ripping out |
ScanBatches. Previously the main thread would block (viaMakeGeneratorIterator) while the work was actually done on a CPU thread. This change makes it so that the main thread is used (viaIterateGenerator) to do the work instead. No CPU threads will then be required.ExecContextwill now always have anexecutor. It is no longer valid to passnullptr. I added aDCHECKto ensure this.ExecContextwill now use the CPU thread pool. Previously it was usingnullptr(even in some cases where the test was marked "parallel"). This affected a number of tests and I had to add some sorting logic as the results are no longer deterministically ordered.ExecPlan'suse_threadsfunction reflect's this.StopProducingsignal. While generally harmless this could lead to bugs. TheExecPlanis responsible for callingStopProducingon each node in a certain order. There is no need to forward the signal.ScanOptionsis used to create aScanNode. There is a fieldScanOptions::use_threadsbut it doesn't make sense for a single node to have its own setting so this is ignored in theScanNode. We still use this field however because we also use aScanOptionsto create an entire ExecPlan in some cases and, in those cases, we do respect the value of this field.require_sequenced_outputflag. It was not being used and I removed it. The original problem was as follows: If you are sequencing your final plan output, and you are applying backpressure, and you are not sequencing your scan itself, then it is possible to deadlock. For example: if you output batches 2, 3, 4, and 5 and then you hit backpressure, you will never relieve the backpressure because you can never output batch 1. Backpressure has since been removed from general scanning (it's only wired in for dataset writes) and so this feature is untested at the moment. Furthermore, should we add it back in we could solve all of this much easier by simply requiring the backpressure limit to be greater than the readahead limit.