ARROW-15584: [C++] Add support for Substrait's RelCommon::Emit #13914

vibhatha · 2022-08-18T12:21:24Z

Adding emit feature for Substrait plan deserialization.

This PR covers emits for read, filter, project, join and aggregate operations.

github-actions · 2022-08-18T12:26:25Z

https://issues.apache.org/jira/browse/ARROW-15584

github-actions · 2022-08-18T12:26:27Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

vibhatha · 2022-08-21T05:38:52Z

cc @westonpace @jeroen please take a look.

jvanstraten

I only checked for inconsistencies with Substrait, so not for C++ or Acero-related problems or code quality. Emit handling looks good to me from that perspective, but I did find a few schema deduction problems.

cpp/src/arrow/engine/substrait/relation_internal.cc

jvanstraten · 2022-08-22T08:50:41Z

cpp/src/arrow/engine/substrait/relation_internal.cc

Wrong order; keys come first.

The list of distinct columns from each grouping set (ordered by their first appearance) followed by the list of measures in declaration order, [...]

https://substrait.io/relations/logical_relations/#aggregate-operation

@jvanstraten I also noticed this, but I forget to leave a comment about it. This is probably a separate JIRA because of the order used in the aggregate_node.cc[1]. Please refer to the comment in this line and the two loops after that. The aggregate fields appened first and then the key fields. One thing we can do is swap the response here.

cc @westonpace

[1].

arrow/cpp/src/arrow/compute/exec/aggregate_node.cc

Line 345 in 50a7d15

// Aggregate fields come before key fields to match the behavior of GroupBy function

Based on the comment this looks like intentional behavior of Arrow, so I don't think aggregate node is going to be adjusted to match Substrait. So that just means there should be a project node inserted behind the aggregate node that moves the columns around accordingly, right? I guess you could fix that in a separate JIRA/PR though. Maybe add a FIXME comment in that case?

Yes, the requirement in Acero may be static here.
We can use the project to swap things around and document it properly. Probably we can do it in this PR as well.

cc @westonpace

On second thoughts, it would be better to solve this one in another PR. Because I am not quite sure if this would break R test cases.

Another PR is fine but I wouldn't consider the output order from Acero to be too static. Fixing it up to output things in the order Substrait expects would be nice so we can at least avoid the project node in some cases (when a direct emit). It'll be a breaking change and probably cause some slight heartburn to our existing tests but we should probably fix it while we still have the opportunity.

Jira created here: https://issues.apache.org/jira/browse/ARROW-17656

cpp/src/arrow/engine/substrait/relation_internal.h

cpp/src/arrow/engine/substrait/relation_internal.cc

westonpace

It seems I had left this review in the pending state. Apologies.

cpp/src/arrow/engine/substrait/serde_test.cc

vibhatha · 2022-09-07T11:10:53Z

@westonpace I updated the PR.

westonpace

A few naming suggestions and potential spots for cleanup but no complaints to the overall approach.

cpp/src/arrow/engine/substrait/relation_internal.cc

cpp/src/arrow/engine/substrait/serde_test.cc

…ges for emit

westonpace

Thanks again for putting this together.

vibhatha · 2022-09-13T07:09:46Z

Thank you for reviewing this one and keeping up with a few rounds of reviews.

ursabot · 2022-09-13T12:31:43Z

Benchmark runs are scheduled for baseline = 9d65981 and contender = 7f77811. 7f77811 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️1.12% ⬆️0.17%] test-mac-arm
[Failed ⬇️0.28% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️3.55% ⬆️1.92%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 7f778113 ec2-t3-xlarge-us-east-2
[Failed] 7f778113 test-mac-arm
[Failed] 7f778113 ursa-i9-9960x
[Finished] 7f778113 ursa-thinkcentre-m75q
[Finished] 9d659810 ec2-t3-xlarge-us-east-2
[Failed] 9d659810 test-mac-arm
[Failed] 9d659810 ursa-i9-9960x
[Finished] 9d659810 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

…e#13914) Adding emit feature for Substrait plan deserialization. This PR covers emits for `read`, `filter`, `project`, `join` and `aggregate` operations. Authored-by: Vibhatha Abeykoon <vibhatha@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>

github-actions bot added the Component: C++ label Aug 18, 2022

vibhatha marked this pull request as ready for review August 21, 2022 05:38

jvanstraten suggested changes Aug 22, 2022

View reviewed changes

westonpace self-requested a review August 22, 2022 16:31

westonpace reviewed Aug 29, 2022

View reviewed changes

cpp/src/arrow/engine/substrait/relation_internal.cc Outdated Show resolved Hide resolved

vibhatha requested a review from westonpace September 1, 2022 01:04

vibhatha force-pushed the arrow-15584 branch from f9f30f3 to 5050706 Compare September 1, 2022 08:20

westonpace reviewed Sep 6, 2022

View reviewed changes

vibhatha force-pushed the arrow-15584 branch from 5050706 to 9c099eb Compare September 7, 2022 06:53

vibhatha requested a review from westonpace September 7, 2022 07:27

This was referenced Sep 7, 2022

Update .proto files to the same version that Arrow is using voltrondata/substrait-r#181

Merged

Re-enable tests for Arrow substrait compiler voltrondata/substrait-r#195

Open

westonpace requested changes Sep 8, 2022

View reviewed changes

vibhatha added 13 commits September 9, 2022 18:45

feat(project): adding project test case for substrait with minor chan…

6f4a3f6

…ges for emit

feat(emit): initial version of emit with project added

9367049

fix(test): fixing the test feature

a83deec

feat(data-gen): adding data generator script wip

e55b2c8

fix(format): refactor to simplify tests

7fc83cb

feat(filter): adding filter emit

43fea24

feat(join): adding join example

136edf5

fix(rebase): merge with substrait changes

a2c08a2

fix(project): replaced the add op with equal for test case

9578a91

feat(aggreagte): basic end-to-end test added

fb77dc1

feat(agg): adding aggregate feature for emits

ea2a05c

fix(num_columns): fix the number of columns for emit feature

8da8b54

fix(cleanup): cleaning up code

1f4da76

vibhatha added 8 commits September 9, 2022 19:00

fix(reviews): remove column count from DeclarationInfo

d99ddf4

fix(reviews): removed a redundant loop

81ad00b

fix(reviews): updated the emit processing logic and added switch cases

bba665f

fix(path_issue): added a check for replacing clause

7eb4623

fix(path): remove temp path fix

54b18df

fix(reviews): imd commit

5051070

fix(read): namedTable emit config added

5bb1051

fix(rebase)

19e49ed

vibhatha force-pushed the arrow-15584 branch from e185a2d to 19e49ed Compare September 9, 2022 13:57

fix(reviews): address reviews

2416e95

vibhatha requested a review from westonpace September 9, 2022 15:23

westonpace approved these changes Sep 13, 2022

View reviewed changes

westonpace merged commit 7f77811 into apache:master Sep 13, 2022

asfimport mentioned this pull request Sep 13, 2022

[C++] Add support for Substrait's RelCommon::Emit #31048

Closed

ARROW-15584: [C++] Add support for Substrait's RelCommon::Emit #13914

ARROW-15584: [C++] Add support for Substrait's RelCommon::Emit #13914

Uh oh!

Conversation

vibhatha commented Aug 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 18, 2022

Uh oh!

github-actions bot commented Aug 18, 2022

Uh oh!

vibhatha commented Aug 21, 2022

Uh oh!

jvanstraten left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jvanstraten Aug 22, 2022

Choose a reason for hiding this comment

Uh oh!

vibhatha Sep 5, 2022

Choose a reason for hiding this comment

Uh oh!

jvanstraten Sep 5, 2022

Choose a reason for hiding this comment

Uh oh!

vibhatha Sep 5, 2022

Choose a reason for hiding this comment

Uh oh!

vibhatha Sep 7, 2022

Choose a reason for hiding this comment

Uh oh!

westonpace Sep 8, 2022

Choose a reason for hiding this comment

Uh oh!

vibhatha Sep 9, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vibhatha commented Sep 7, 2022

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

vibhatha commented Sep 13, 2022

Uh oh!

ursabot commented Sep 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vibhatha commented Aug 18, 2022 •

edited

Loading