-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-18004: [C++] ExecBatch conversion to RecordBatch may go out of bounds #14386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Same problem as #14347: this is basically adding a partial, incomplete guard for a condition the caller is supposed to check for themselves. |
|
If we wanted to check that the ExecBatch values correspond to the Schema, we should check for schema equality on each of the Datum values. That can unfortunately be expensive. |
To be sure, I understand that by partial incomplete guard you mean not checking for full schema equality. My goal here (and in the PR you linked) is not to fully guard, and I understand the cost of trying to do so. My goal is to avoid runtime crashes that are hard to debug when the cost of doing so is small. IMHO, this PR is a good tradeoff because it uses a cheap check to transform a crash failure to a raised error, which is easier to debug. I run into such crash failures while developing test cases. I agree that when a test case is correct then the failure cases do not occur, yet during development test cases are frequently not correct. Another reason is that developers of large apps would generally prefer an error the app can handle, and perhaps raise to its user, over crashing the app. |
I agree with this, but it is actually misleading here, because we're only checking a single condition and otherwise let errors crash silently (or corrupt memory etc.). |
cpp/src/arrow/compute/exec_test.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this succeed and produce a truncated result here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment gives the background.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I checked this locally and it occurs in multiple places unfortunately. However, I don't think this is a behavior that we want to set in stone, so I think we should remove this particular test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll soon add a commit with checks showing the current situation, if only for posterity, and then I'll remove ones we do not want to keep.
|
Edit: if this avoids an immediate crash and allows you to see |
Before coding the condition and If we'd like to fix these test failures, I'd suggest doing so separately. |
|
@westonpace Are the failures mentioned above (mismatching schema size) legitimate? |
With the current PR code, |
Don't know yet; I'll need to examine. If I manage to do so quickly, I'll report here. |
I agree the leniency of the checks, as compared to validation, could be unexpected to some developers and therefore misleading. I'd suggest adding doc strings to clarify this. OTOH, I'm not sure which crash condition you mean could still occur. Since the current PR code check the number of values and their types before applying type-specific operations, I believe it can only crash on the |
|
Ok, it looks like Acero relies on being able to silently truncate the number of fields in that method. Which is quite unfortunate. |
cpp/src/arrow/compute/exec.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| DCHECK(false); | |
| Unreachable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's not really unreachable :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a violation of ExecBatch's class invariant for the values to be other than Array or Scalar. Now that I'm looking for a statement of that invariant it's not easy to point at something, the closest I've got is in streaming_execution.rst. The constructor and ExecBatch::Make don't enforce this either. This validation should be explicit and centralized in ExecBatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should certainly add ExecBatch::Validate (and perhaps ExecBatch::ValidateFull).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of adding these validation methods - let's create a separate jira for this.
The constructor and ExecBatch::Make don't enforce this either.
Right. Also note that ExecBatch has public members, which various pieces of code access directly, so it can be easy to make it invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Let's at least add a meaningful error message to the DCHECK :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created https://issues.apache.org/jira/browse/ARROW-18015 for the validation. In the meantime, I added a meaningful error message.
|
|
|
This comment applies to the commit I just pushed. |
|
@pitrou, please let me know which checks to remove from the test. |
In the interest of moving this forward before 10.0.0, I pushed some changes myself. |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @rtpsw for persisting on this. Also sorry for the longish review process on this one.
|
Thanks @pitrou ! |
|
CI passed on @rtpsw 's fork. |
|
Benchmark runs are scheduled for baseline = f3327d2 and contender = b5b41cc. b5b41cc is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
See https://issues.apache.org/jira/browse/ARROW-18004