ARROW-18004: [C++] ExecBatch conversion to RecordBatch may go out of bounds #14386

rtpsw · 2022-10-12T13:24:56Z

See https://issues.apache.org/jira/browse/ARROW-18004

rtpsw · 2022-10-12T13:31:30Z

pitrou · 2022-10-12T13:34:07Z

Same problem as #14347: this is basically adding a partial, incomplete guard for a condition the caller is supposed to check for themselves.

pitrou · 2022-10-12T13:35:24Z

If we wanted to check that the ExecBatch values correspond to the Schema, we should check for schema equality on each of the Datum values. That can unfortunately be expensive.

rtpsw · 2022-10-12T13:47:42Z

Same problem as #14347: this is basically adding a partial, incomplete guard for a condition the caller is supposed to check for themselves.

If we wanted to check that the ExecBatch values corresponding to the Schema, we should check for schema equality on each of the Datum values. That can unfortunately be expensive.

To be sure, I understand that by partial incomplete guard you mean not checking for full schema equality. My goal here (and in the PR you linked) is not to fully guard, and I understand the cost of trying to do so. My goal is to avoid runtime crashes that are hard to debug when the cost of doing so is small. IMHO, this PR is a good tradeoff because it uses a cheap check to transform a crash failure to a raised error, which is easier to debug. I run into such crash failures while developing test cases. I agree that when a test case is correct then the failure cases do not occur, yet during development test cases are frequently not correct. Another reason is that developers of large apps would generally prefer an error the app can handle, and perhaps raise to its user, over crashing the app.

pitrou · 2022-10-12T13:53:14Z

Another reason is that developers of large apps would generally prefer an error the app can handle, and perhaps raise to its user, over crashing the app.

I agree with this, but it is actually misleading here, because we're only checking a single condition and otherwise let errors crash silently (or corrupt memory etc.).

pitrou · 2022-10-12T13:54:17Z

cpp/src/arrow/compute/exec_test.cc

Why would this succeed and produce a truncated result here?

This comment gives the background.

Yes, I checked this locally and it occurs in multiple places unfortunately. However, I don't think this is a behavior that we want to set in stone, so I think we should remove this particular test.

I'll soon add a commit with checks showing the current situation, if only for posterity, and then I'll remove ones we do not want to keep.

pitrou · 2022-10-12T13:57:12Z

Edit: if this avoids an immediate crash and allows you to see Validate failing afterwards, then I would be ok with this (but let's not silently truncate output columns either).

rtpsw · 2022-10-12T14:05:37Z

(but let's not silently truncate output columns either).

Before coding the condition if (static_cast<size_t>(schema->num_fields()) > values.size()) with > I tried coding it with != but got test failures elsewhere:

[ RUN      ] ExecPlanExecution.StressSourceOrderBy
/mnt/user1/tscontract/github/rtpsw/arrow/cpp/src/arrow/compute/exec/plan_test.cc:739: Failure
Failed
'_error_or_value56.status()' failed with Invalid: mismatching schema size
Google Test trace:
/mnt/user1/tscontract/github/rtpsw/arrow/cpp/src/arrow/compute/exec/plan_test.cc:720: single threaded
/mnt/user1/tscontract/github/rtpsw/arrow/cpp/src/arrow/compute/exec/plan_test.cc:717: unslowed
[  FAILED  ] ExecPlanExecution.StressSourceOrderBy (1 ms)

and

[ RUN      ] Substrait.BasicPlanRoundTrippingEndToEnd
/mnt/user1/tscontract/github/rtpsw/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:58: Plan was destroyed before finishing
/mnt/user1/tscontract/github/rtpsw/arrow/cpp/src/arrow/engine/substrait/serde_test.cc:2083: Failure
Failed
'_error_or_value131.status()' failed with Invalid: mismatching schema size
[  FAILED  ] Substrait.BasicPlanRoundTrippingEndToEnd (1 ms)

If we'd like to fix these test failures, I'd suggest doing so separately.

pitrou · 2022-10-12T14:06:48Z

@westonpace Are the failures mentioned above (mismatching schema size) legitimate?

rtpsw · 2022-10-12T14:10:01Z

Edit: if this avoids an immediate crash and allows you to see Validate failing afterwards, then I would be ok with this

With the current PR code, exec_batch.ToRecordBatch(reject_schema) just raises an error, i.e., it does not let an invalid record batch get created. Should I add some other check, and which?

rtpsw · 2022-10-12T14:10:54Z

@westonpace Are the failures mentioned above (mismatching schema size) legitimate?

Don't know yet; I'll need to examine. If I manage to do so quickly, I'll report here.

rtpsw · 2022-10-12T14:20:05Z

I agree with this, but it is actually misleading here, because we're only checking a single condition and otherwise let errors crash silently (or corrupt memory etc.).

I agree the leniency of the checks, as compared to validation, could be unexpected to some developers and therefore misleading. I'd suggest adding doc strings to clarify this. OTOH, I'm not sure which crash condition you mean could still occur. Since the current PR code check the number of values and their types before applying type-specific operations, I believe it can only crash on the DCHECK(false) line. I could fix this to return an error.

pitrou · 2022-10-12T15:10:17Z

Ok, it looks like Acero relies on being able to silently truncate the number of fields in that method. Which is quite unfortunate.

bkietz · 2022-10-12T15:13:35Z

cpp/src/arrow/compute/exec.cc

Suggested change

DCHECK(false);

Unreachable();

Well, it's not really unreachable :-)

It would be a violation of ExecBatch's class invariant for the values to be other than Array or Scalar. Now that I'm looking for a statement of that invariant it's not easy to point at something, the closest I've got is in streaming_execution.rst. The constructor and ExecBatch::Make don't enforce this either. This validation should be explicit and centralized in ExecBatch

We should certainly add ExecBatch::Validate (and perhaps ExecBatch::ValidateFull).

I'm in favor of adding these validation methods - let's create a separate jira for this.

The constructor and ExecBatch::Make don't enforce this either.

Right. Also note that ExecBatch has public members, which various pieces of code access directly, so it can be easy to make it invalid.

Ok. Let's at least add a meaningful error message to the DCHECK :-)

Created https://issues.apache.org/jira/browse/ARROW-18015 for the validation. In the meantime, I added a meaningful error message.

cpp/src/arrow/compute/exec_test.cc

cpp/src/arrow/compute/exec.cc

github-actions · 2022-10-12T16:16:29Z

https://issues.apache.org/jira/browse/ARROW-18004

github-actions · 2022-10-12T16:16:31Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

rtpsw · 2022-10-12T16:49:18Z

This comment applies to the commit I just pushed.

rtpsw · 2022-10-12T18:42:26Z

@pitrou, please let me know which checks to remove from the test.

cpp/src/arrow/compute/exec.cc

…bounds

pitrou · 2022-10-13T15:13:17Z

@pitrou, please let me know which checks to remove from the test.

In the interest of moving this forward before 10.0.0, I pushed some changes myself.

pitrou

Thanks a lot @rtpsw for persisting on this. Also sorry for the longish review process on this one.

rtpsw · 2022-10-13T15:38:39Z

Thanks @pitrou !

pitrou · 2022-10-13T17:10:01Z

CI passed on @rtpsw 's fork.

ursabot · 2022-10-14T19:34:07Z

Benchmark runs are scheduled for baseline = f3327d2 and contender = b5b41cc. b5b41cc is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.56% ⬆️0.0%] test-mac-arm
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.82% ⬆️0.04%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] b5b41ccf ec2-t3-xlarge-us-east-2
[Failed] b5b41ccf test-mac-arm
[Failed] b5b41ccf ursa-i9-9960x
[Finished] b5b41ccf ursa-thinkcentre-m75q
[Finished] f3327d2c ec2-t3-xlarge-us-east-2
[Failed] f3327d2c test-mac-arm
[Failed] f3327d2c ursa-i9-9960x
[Finished] f3327d2c ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

rtpsw mentioned this pull request Oct 12, 2022

ARROW-17980: [C++] As-of-Join Substrait extension #14385

Closed

pitrou reviewed Oct 12, 2022

View reviewed changes

bkietz requested changes Oct 12, 2022

View reviewed changes

pitrou reviewed Oct 12, 2022

View reviewed changes

cpp/src/arrow/compute/exec_test.cc Outdated Show resolved Hide resolved

pitrou reviewed Oct 12, 2022

View reviewed changes

cpp/src/arrow/compute/exec.cc Outdated Show resolved Hide resolved

github-actions bot added the Component: C++ label Oct 12, 2022

icexelloss reviewed Oct 12, 2022

View reviewed changes

cpp/src/arrow/compute/exec.cc Outdated Show resolved Hide resolved

rtpsw and others added 3 commits October 13, 2022 16:59

ARROW-18004: [C++] ExecBatch conversion to RecordBatch may go out of …

b0b9504

…bounds

requested fixes

bfe747f

Rework tests

df0a10c

pitrou force-pushed the ARROW-18004 branch from 126e3ee to df0a10c Compare October 13, 2022 15:12

pitrou approved these changes Oct 13, 2022

View reviewed changes

pitrou merged commit b5b41cc into apache:master Oct 13, 2022

rtpsw deleted the ARROW-18004 branch October 13, 2022 18:43

asfimport mentioned this pull request Oct 14, 2022

[C++] ExecBatch conversion to RecordBatch may go out of bounds #33208

Closed

ARROW-18004: [C++] ExecBatch conversion to RecordBatch may go out of bounds #14386

ARROW-18004: [C++] ExecBatch conversion to RecordBatch may go out of bounds #14386

Uh oh!

Conversation

rtpsw commented Oct 12, 2022

Uh oh!

rtpsw commented Oct 12, 2022

Uh oh!

pitrou commented Oct 12, 2022

Uh oh!

pitrou commented Oct 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rtpsw commented Oct 12, 2022

Uh oh!

pitrou commented Oct 12, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pitrou commented Oct 12, 2022

Uh oh!

rtpsw commented Oct 12, 2022

Uh oh!

pitrou commented Oct 12, 2022

Uh oh!

rtpsw commented Oct 12, 2022

Uh oh!

rtpsw commented Oct 12, 2022

Uh oh!

rtpsw commented Oct 12, 2022

Uh oh!

pitrou commented Oct 12, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 12, 2022

Uh oh!

github-actions bot commented Oct 12, 2022

Uh oh!

rtpsw commented Oct 12, 2022

Uh oh!

rtpsw commented Oct 12, 2022

Uh oh!

Uh oh!

pitrou commented Oct 13, 2022

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

rtpsw commented Oct 13, 2022

Uh oh!

pitrou commented Oct 13, 2022

Uh oh!

ursabot commented Oct 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

pitrou commented Oct 12, 2022 •

edited

Loading