-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-34059: [C++] Add a fetch node based on a batch index #34060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-34059: [C++] Add a fetch node based on a batch index #34060
Conversation
|
|
e8a716b to
e1e6dd7
Compare
… be ok since both types aren't used outside this context.
|
Benchmark runs are scheduled for baseline = 24e5a58 and contender = b056e07. b056e07 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
['Python', 'R'] benchmarks have high level of regressions. |
This PR introduces the concept of
ExecBatch:indexbut does not yet do much with it. As a proof of concept this PR adds a fetch node which can be inserted anywhere in the plan (not just at the sink) to satisfyLIMIT x OFFSET y(Substrait calls this fetch and so I have also).This PR also introduces two sequencing accumulation queues which will be useful, I hope, for anyone implementing nodes that rely on ordered execution.
This PR unfortunately introduces a new query option which is whether or not the sink node should pay the small performance hit required to sequence output. While considering how best to add this option I realized we will probably have more query options in the near future regarding "how much RAM to use" (e.g. spillover) and potentially more beyond that.
So I have taken all the options and put them into
arrow::compute::QueryOptions(this already existed but it was not user facing and I added more things to it). I added a new DeclarationToXyz overload that accepts QueryOptions. This has, unfortunately, led to a bit of overload explosion but I think this should be the last new addition to the overload set (and we can deprecate the older overloads at some point).This PR also includes a new
gen::Gen / gen::TestGenfacility for generating test tables for input. I'd like to eventually use this to simplify some of the existing exec plan tests as well. I'm willing to split this into a separate PR if that makes sense.