-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-13640][BEAM-12883] Add MetadataDynamicCoder to support encode-decode for new fields in Metatdata #15699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-13640][BEAM-12883] Add MetadataDynamicCoder to support encode-decode for new fields in Metatdata #15699
Conversation
… add-coder-readable-file-v2 # Conflicts: # sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/ReadableFileCoderV2.java
…or any new field that added to Metadata
|
The other PR seems to be merged. Could we close this? |
|
Please don't, the other PR fixed the ReadableFileCoder, and this one fixes the Mrtadatacoder itself. |
|
This is pretty cool. Thanks @brachi-wernick ! I think it makes sense to create a JIRA issue with target version 3.0.0 to remove the V1 and V2 coders, and rely directly on the latest versions of these coders. WDYT? I will ask someone who knows coders better to verify that this coder implementation is backwards-compatible for the addition of new fields. |
|
Folks, what are the next steps on this PR? |
|
@pabloem - a friendly ping for a review. |
|
@pabloem - Could you please take a look? |
|
@pabloem - friendly ping. Could you review or find another reviewer? |
|
@chamikaramj @kerrydc - Could you please help to find a reviewer? |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions. |
|
@chamikaramj / @johnjcasey - folks could you please review or find a new reviewer? |
johnjcasey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good, though I think we will need to consider how we want to go about configuring this in a maintainable way going forward. Adding some fields, and ignoring others, could become complex if we don't have a good pattern.
|
|
||
| private static final MetadataCoder V1_CODER = MetadataCoder.of(); | ||
|
|
||
| private List<Coder<?>> coders = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to use parallel lists? I would typically create a new pojo of {coder, getter, setter} and have a list of that instead, to avoid off by one errors if we ever modify this class, or need to add a new aspect to the dynamic coder that would warrant a new list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted into class in 420d4aa
Regrading parallel list, do you mean stream.parallel? and decode/encode in parallel? I think it won't work since we iterate the stream sequentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah no, parallel list for me means when we have 2 or more lists, where listA.get(2) is logically associated with listB.get(2)
| @Rule public transient TemporaryFolder tmpFolder = new TemporaryFolder(); | ||
|
|
||
| @Test | ||
| public void testEncodeDecodeWithDefaultLastModifiedMills() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add a test that uses two dynamic coders, so we can verify that adding multiple coders stack as expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind problematic, since all fields now in Metadata are handled. except the one that I used in the test.
It can work if I won't use the basic MetadataCoder in this dynamic coder, and work only with the fields coders I get in the list, But I think it will be mess to developers to assign all these basic fields for the dynamic coder, it is easy now, that they need to send only new/special fields and not all the basic fields.
(coder now first does : MatchResult.Metadata.Builder builder = V1_CODER.decodeBuilder(inStream);
which covers most of the fields in Metadata. and new fields will be covered with this coder list of {coder,getter,setter})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
…lass (coder, getter, setter)
…add-metadata-dynamic-coder
|
LGTM |
|
Can one of the admins verify this patch? |
3 similar comments
|
Can one of the admins verify this patch? |
|
Can one of the admins verify this patch? |
|
Can one of the admins verify this patch? |
|
reopening PR to triggert tests to re-run |
|
What is the next step on this PR? |
|
Could this be merged? Should it be closed? |
|
Run Java Precommit |
|
Looks like tests are failing. @brachi-wernick could you investigate and resolve the breaks? |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
This PR is a continuous work for #15510.
Currently there are 2 Coders for Metadata: default one:
org.apache.beam.sdk.io.fs.MetadataCoderand enhanced oneorg.apache.beam.sdk.io.fs.MetadataCoderV2, the last can also decode-encodelastModifiedMillisand it is done in a new coder in order to support backward compatibility.This will be hard to maintain, we will need to create a new coder for any new field that will be added to
Metadata.So, as suggested in this comment: #15510 (comment), I came up with new generic coder :
MetadataDynamicCoder.MetadataDynamicCodercan decode/encode any new fields added toMetadataby sending getter, setter and coder.For example creating coder for
lastModifiedMillis:I chose to get explicit getter/setter to avoid reflection which has bad impact on performance.