-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Spec: add multi-arg transform support #8579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
097f1ab
spec: multi-arg transform
advancedxy 6499552
refine wording
advancedxy 888c90b
chore: address comments and refine wording
advancedxy a2f7b7a
address comments
advancedxy 0221fc9
address comments
advancedxy a8076dd
remove bucketV2 transform
advancedxy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nitpick. It seems that it would be better to choose a field ID from the existing range for reserved field IDs (e.g. MAX_INT-200) then to use -1, which as far as I can tell is still potentially a valid field according to the spec (I might have missed it but field IDs simply seem to be defined as integers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer using the first column from the source ID list instead of a fake ID. That way older readers at least see that the transform is associated with one of the correct columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using a valid source ID here would lead to incorrect results for old clients if a predicate is specified on the column. IIUC invalid ID here makes sure reads should always be correct or fail which seems like better semantics if the aim is forwards compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my understanding, multi-arg transforms will mostly get a new transform name rather than the existing ones. Older readers will treat this multi-arg transform as an
UnknownTransform, the persistedsource-idis just to make old code happy, see this reply as well: #8579 (comment). So thevalueofsource-idis just a place holder and doesn't make too much sense. It could be a field ID from the reserved range or a negative one since the current reference implementation wouldn't produce a negative field id.I simply choose
-1as it seems more nature and doesn't need to put a somehow weird reserved field in theMetadataColumns.java, but I think we make always make follow-up pr if there's valid concerns/solutions.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is a different transform (I wasn't clear on the final status there) I think it makes it less important so at this point it is bike shedding but I think having a clear signal that this field is meaningless might be useful. I think for V3 it might be worthwhile to consider dropping the backwards compatibility.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I also think older readers will not be able to make any use of the new mulit-arg transforms. So they would only be able to read new tables (though without any partition pushdown), and would fail to write. So I agree , it is moot what to even put for source-id here, though I think choosing a reserved one is a good idea. Is it just so the java reference implementation can properly de-serialize as Unknown and have a better exception message?
Is the idea in v1/v2 to write source-id column as -1/reserved, and in v3, we will write source-ids for everything and drop source-id column?
I guess this is a more general discussion and can wait the new spec pr clarifying v1/2 vs v3 behaviors.