-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-9265: [C++] Allow writing and reading V4-compliant IPC data #7664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note that apache/arrow-testing#35 needs to be merged first. |
71f42d4 to
9bb7d22
Compare
|
Just a high level comment: if I'm reading this right, V4 is still the default metadata version and applications opt in to V5 when they want to read/write unions. Am I understanding this right? |
|
That is right indeed. |
|
The ASAN/UBSAN Ci failure should be fixed when merging #7644. |
0dc65e8 to
84d1014
Compare
|
I see, I will update the PR then. |
|
I just sent an e-mail to the ML. We don't want to continue producing V4 metadata unless we need to for forward compatibility reasons. |
88dc2a1 to
f73125a
Compare
f73125a to
75bb873
Compare
e34d877 to
23c1a0e
Compare
|
Rebased. |
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, there are a couple lines can be removed.
We need to expose the metadata version configuration along with an environment variable option to set the default to V4 (similar to what we did for the IPC alignment changes) in a separate PR before releasing. @BryanCutler can help validate that we have enough to allow e.g. Spark users to upgrade to 1.0.0 without breaking stuff
python/pyarrow/ipc.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, where will we break (presumably earlier than this) if we were to encounter an unrecognized version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that integration still passes in the corresponding Java PR #7685 (before rebasing on top of this) would imply that the version doesn't get checked, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like it (Java does not check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, neither checks, because #7685 passes (Java sending V5 while C++ sends V4) and this passed (Java sends V4 while C++ sends V5)
Though that may be reassuring to anyone planning to use 0.17.1 with 1.0.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes, well I will open a JIRA about adding appropriate checks at least for 1.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://github.com/apache/arrow/blob/maint-0.17.x/cpp/src/arrow/ipc/message.cc#L57
So it only checks for old versions but new versions pass silently, which is quite scary to me. But on the other hand the risk of V5 data breaking a V4 application (e.g. Spark) at the moment is low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
I'm quickly taking care of these small things so this can be merged |
V4 Union arrays with top-level null slots are disallowed, though. Also enable integration tests against 0.17.1 gold files.
23c1a0e to
838d1bb
Compare
V4 Union arrays with top-level null slots are disallowed, though.