Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Jul 7, 2020

V4 Union arrays with top-level null slots are disallowed, though.

@pitrou
Copy link
Member Author

pitrou commented Jul 7, 2020

Note that apache/arrow-testing#35 needs to be merged first.

@github-actions
Copy link

github-actions bot commented Jul 7, 2020

@pitrou pitrou force-pushed the ARROW-9265-metadata-version-unions branch from 71f42d4 to 9bb7d22 Compare July 7, 2020 19:04
@lidavidm
Copy link
Member

lidavidm commented Jul 7, 2020

Just a high level comment: if I'm reading this right, V4 is still the default metadata version and applications opt in to V5 when they want to read/write unions. Am I understanding this right?

@pitrou
Copy link
Member Author

pitrou commented Jul 8, 2020

That is right indeed.

@pitrou
Copy link
Member Author

pitrou commented Jul 8, 2020

The ASAN/UBSAN Ci failure should be fixed when merging #7644.

@pitrou pitrou force-pushed the ARROW-9265-metadata-version-unions branch from 0dc65e8 to 84d1014 Compare July 8, 2020 08:42
@wesm
Copy link
Member

wesm commented Jul 8, 2020

@lidavidm @pitrou I haven't looked at the patch yet, but V5 must be the default produced by writers but V4 can be opted in to.

@pitrou
Copy link
Member Author

pitrou commented Jul 8, 2020

I see, I will update the PR then.

@wesm
Copy link
Member

wesm commented Jul 8, 2020

I just sent an e-mail to the ML. We don't want to continue producing V4 metadata unless we need to for forward compatibility reasons.

@pitrou pitrou force-pushed the ARROW-9265-metadata-version-unions branch 3 times, most recently from 88dc2a1 to f73125a Compare July 8, 2020 14:14
@pitrou pitrou force-pushed the ARROW-9265-metadata-version-unions branch from f73125a to 75bb873 Compare July 8, 2020 18:19
@pitrou pitrou requested a review from wesm July 9, 2020 09:56
@pitrou pitrou force-pushed the ARROW-9265-metadata-version-unions branch from e34d877 to 23c1a0e Compare July 9, 2020 09:59
@pitrou
Copy link
Member Author

pitrou commented Jul 9, 2020

Rebased.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, there are a couple lines can be removed.

We need to expose the metadata version configuration along with an environment variable option to set the default to V4 (similar to what we did for the IPC alignment changes) in a separate PR before releasing. @BryanCutler can help validate that we have enough to allow e.g. Spark users to upgrade to 1.0.0 without breaking stuff

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, where will we break (presumably earlier than this) if we were to encounter an unrecognized version?

Copy link
Member

@lidavidm lidavidm Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that integration still passes in the corresponding Java PR #7685 (before rebasing on top of this) would imply that the version doesn't get checked, no?

Copy link
Member

@wesm wesm Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like it (Java does not check)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, neither checks, because #7685 passes (Java sending V5 while C++ sends V4) and this passed (Java sends V4 while C++ sends V5)

Though that may be reassuring to anyone planning to use 0.17.1 with 1.0.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, well I will open a JIRA about adding appropriate checks at least for 1.0.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/apache/arrow/blob/maint-0.17.x/cpp/src/arrow/ipc/message.cc#L57

So it only checks for old versions but new versions pass silently, which is quite scary to me. But on the other hand the risk of V5 data breaking a V4 application (e.g. Spark) at the moment is low.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesm
Copy link
Member

wesm commented Jul 9, 2020

I'm quickly taking care of these small things so this can be merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants