GH-34059: [C++] Add a fetch node based on a batch index #34060

westonpace · 2023-02-07T00:10:32Z

This PR introduces the concept of ExecBatch:index but does not yet do much with it. As a proof of concept this PR adds a fetch node which can be inserted anywhere in the plan (not just at the sink) to satisfy LIMIT x OFFSET y (Substrait calls this fetch and so I have also).

This PR also introduces two sequencing accumulation queues which will be useful, I hope, for anyone implementing nodes that rely on ordered execution.

This PR unfortunately introduces a new query option which is whether or not the sink node should pay the small performance hit required to sequence output. While considering how best to add this option I realized we will probably have more query options in the near future regarding "how much RAM to use" (e.g. spillover) and potentially more beyond that.

So I have taken all the options and put them into arrow::compute::QueryOptions (this already existed but it was not user facing and I added more things to it). I added a new DeclarationToXyz overload that accepts QueryOptions. This has, unfortunately, led to a bit of overload explosion but I think this should be the last new addition to the overload set (and we can deprecate the older overloads at some point).

This PR also includes a new gen::Gen / gen::TestGen facility for generating test tables for input. I'd like to eventually use this to simplify some of the existing exec plan tests as well. I'm willing to split this into a separate PR if that makes sense.

Closes: [C++] Create a fetch node based on a batch index property #34059

github-actions · 2023-02-07T00:10:57Z

Closes: [C++] Create a fetch node based on a batch index property #34059

github-actions · 2023-02-07T00:10:59Z

⚠️ GitHub issue #34059 has been automatically assigned in GitHub to PR creator.

cpp/src/arrow/compute/exec.h

cpp/src/arrow/compute/exec/exec_plan.h

cpp/src/arrow/compute/exec/accumulation_queue.h

cpp/src/arrow/compute/exec/accumulation_queue.cc

cpp/src/arrow/testing/generator.h

cpp/src/arrow/compute/exec/fetch_node.cc

…ction

… be ok since both types aren't used outside this context.

ursabot · 2023-02-11T11:13:37Z

Benchmark runs are scheduled for baseline = 24e5a58 and contender = b056e07. b056e07 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.43% ⬆️0.98%] test-mac-arm
[Finished ⬇️0.77% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.51% ⬆️0.22%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] b056e07b ec2-t3-xlarge-us-east-2
[Failed] b056e07b test-mac-arm
[Finished] b056e07b ursa-i9-9960x
[Finished] b056e07b ursa-thinkcentre-m75q
[Finished] 24e5a580 ec2-t3-xlarge-us-east-2
[Failed] 24e5a580 test-mac-arm
[Finished] 24e5a580 ursa-i9-9960x
[Finished] 24e5a580 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ursabot · 2023-02-11T11:15:42Z

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

github-actions bot added the Component: C++ label Feb 7, 2023

westonpace requested review from bkietz and lidavidm February 7, 2023 20:58

lidavidm reviewed Feb 7, 2023

View reviewed changes

lidavidm approved these changes Feb 8, 2023

View reviewed changes

westonpace added 6 commits February 9, 2023 11:05

adds a fetch node based on a batch index

e6daa80

Lint

d9f17ba

Missing MSVC export statements

88d5754

Make compute reference in array generators optional

c6f67e3

Remove export label on private class

faf3c12

Cleanup per PR review. Collapse various Gen alternatives into one fun…

e1e6dd7

…ction

westonpace force-pushed the feature/GH-34059--add-fetch-node branch from e8a716b to e1e6dd7 Compare February 9, 2023 19:53

westonpace added 2 commits February 9, 2023 15:55

Marking single-arg constructors as explicit per lint

7489c49

Turns out that implicit constructor was needed after all. This should…

4cbb795

… be ok since both types aren't used outside this context.

westonpace merged commit b056e07 into apache:master Feb 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-34059: [C++] Add a fetch node based on a batch index #34060

GH-34059: [C++] Add a fetch node based on a batch index #34060

Uh oh!

westonpace commented Feb 7, 2023 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Feb 7, 2023

Uh oh!

github-actions bot commented Feb 7, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ursabot commented Feb 11, 2023

Uh oh!

ursabot commented Feb 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GH-34059: [C++] Add a fetch node based on a batch index #34060

GH-34059: [C++] Add a fetch node based on a batch index #34060

Uh oh!

Conversation

westonpace commented Feb 7, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 7, 2023

Uh oh!

github-actions bot commented Feb 7, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ursabot commented Feb 11, 2023

Uh oh!

ursabot commented Feb 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

westonpace commented Feb 7, 2023 •

edited by github-actions bot

Loading