Skip to content

Conversation

@brachi-wernick
Copy link
Contributor

This PR is a continuous work for #15510.

Currently there are 2 Coders for Metadata: default one: org.apache.beam.sdk.io.fs.MetadataCoder and enhanced one org.apache.beam.sdk.io.fs.MetadataCoderV2, the last can also decode-encode lastModifiedMillis and it is done in a new coder in order to support backward compatibility.

This will be hard to maintain, we will need to create a new coder for any new field that will be added to Metadata.

So, as suggested in this comment: #15510 (comment), I came up with new generic coder : MetadataDynamicCoder.

MetadataDynamicCoder can decode/encode any new fields added to Metadata by sending getter, setter and coder.

For example creating coder for lastModifiedMillis:

 new MetadataDynamicCoder()
        .withCoderForField(
            VarLongCoder.of(),
            Metadata::lastModifiedMillis,
            Metadata.Builder::setLastModifiedMillis);

I chose to get explicit getter/setter to avoid reflection which has bad impact on performance.

@brachi-wernick
Copy link
Contributor Author

@pabloem @robertwb new PR submitted #15699 to address MetadataCoder to be more extendable for new fields. can your review?

@aaltay
Copy link
Member

aaltay commented Oct 22, 2021

The other PR seems to be merged. Could we close this?

@brachipa
Copy link

Please don't, the other PR fixed the ReadableFileCoder, and this one fixes the Mrtadatacoder itself.
To address this suggestion/comment #15510 (comment)

@kennknowles kennknowles requested a review from pabloem October 26, 2021 19:50
@kennknowles
Copy link
Member

Adding @pabloem since he reviewed the other PR associated with this Jira, and CC @robertwb

@pabloem
Copy link
Member

pabloem commented Nov 9, 2021

This is pretty cool. Thanks @brachi-wernick !

I think it makes sense to create a JIRA issue with target version 3.0.0 to remove the V1 and V2 coders, and rely directly on the latest versions of these coders. WDYT?

I will ask someone who knows coders better to verify that this coder implementation is backwards-compatible for the addition of new fields.

@aaltay
Copy link
Member

aaltay commented Nov 30, 2021

Folks, what are the next steps on this PR?

@brachi-wernick
Copy link
Contributor Author

@aaltay I would say to merge it as additional dynamic coder(I mean addition to V1 and V2), we can open another JIRA ticket and PR to delete old coders as @pabloem said. WDYT?
This is not critical for now to merge it, but we will need it in the day someone will add more fields to Metadata class.

@aaltay
Copy link
Member

aaltay commented Dec 1, 2021

@aaltay I would say to merge it as additional dynamic coder(I mean addition to V1 and V2), we can open another JIRA ticket and PR to delete old coders as @pabloem said. WDYT?

That sounds reasonable to me. We can merge it once @pabloem completes his review.

@aaltay
Copy link
Member

aaltay commented Dec 11, 2021

@pabloem - a friendly ping for a review.

@brachi-wernick brachi-wernick changed the title [BEAM-12883] Add MetadataDynamicCoder to support encode-decode for new fields in Metatdata [BEAM-13640][BEAM-12883] Add MetadataDynamicCoder to support encode-decode for new fields in Metatdata Jan 12, 2022
@aaltay
Copy link
Member

aaltay commented Jan 21, 2022

@pabloem - Could you please take a look?

@aaltay
Copy link
Member

aaltay commented Jan 27, 2022

@pabloem - friendly ping. Could you review or find another reviewer?

@aaltay
Copy link
Member

aaltay commented Feb 5, 2022

@pabloem - friendly ping? Perhaps @kerrydc could help with finding another reviewer?

@aaltay
Copy link
Member

aaltay commented Feb 18, 2022

@chamikaramj @kerrydc - Could you please help to find a reviewer?

@github-actions
Copy link
Contributor

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Apr 19, 2022
@aaltay
Copy link
Member

aaltay commented Apr 20, 2022

@chamikaramj / @johnjcasey - folks could you please review or find a new reviewer?

Copy link
Contributor

@johnjcasey johnjcasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good, though I think we will need to consider how we want to go about configuring this in a maintainable way going forward. Adding some fields, and ignoring others, could become complex if we don't have a good pattern.


private static final MetadataCoder V1_CODER = MetadataCoder.of();

private List<Coder<?>> coders = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use parallel lists? I would typically create a new pojo of {coder, getter, setter} and have a list of that instead, to avoid off by one errors if we ever modify this class, or need to add a new aspect to the dynamic coder that would warrant a new list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted into class in 420d4aa

Regrading parallel list, do you mean stream.parallel? and decode/encode in parallel? I think it won't work since we iterate the stream sequentially.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, parallel list for me means when we have 2 or more lists, where listA.get(2) is logically associated with listB.get(2)

@Rule public transient TemporaryFolder tmpFolder = new TemporaryFolder();

@Test
public void testEncodeDecodeWithDefaultLastModifiedMills() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add a test that uses two dynamic coders, so we can verify that adding multiple coders stack as expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind problematic, since all fields now in Metadata are handled. except the one that I used in the test.

It can work if I won't use the basic MetadataCoder in this dynamic coder, and work only with the fields coders I get in the list, But I think it will be mess to developers to assign all these basic fields for the dynamic coder, it is easy now, that they need to send only new/special fields and not all the basic fields.

(coder now first does : MatchResult.Metadata.Builder builder = V1_CODER.decodeBuilder(inStream);
which covers most of the fields in Metadata. and new fields will be covered with this coder list of {coder,getter,setter})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

@github-actions github-actions bot removed the stale label Apr 21, 2022
@github-actions github-actions bot added the java label Apr 26, 2022
@johnjcasey
Copy link
Contributor

LGTM

@pabloem pabloem closed this Apr 26, 2022
@pabloem pabloem reopened this Apr 26, 2022
@asf-ci
Copy link

asf-ci commented Apr 26, 2022

Can one of the admins verify this patch?

3 similar comments
@asf-ci
Copy link

asf-ci commented Apr 26, 2022

Can one of the admins verify this patch?

@asf-ci
Copy link

asf-ci commented Apr 26, 2022

Can one of the admins verify this patch?

@asf-ci
Copy link

asf-ci commented Apr 26, 2022

Can one of the admins verify this patch?

@pabloem
Copy link
Member

pabloem commented Apr 26, 2022

reopening PR to triggert tests to re-run

@aaltay
Copy link
Member

aaltay commented May 5, 2022

What is the next step on this PR?

@aaltay
Copy link
Member

aaltay commented May 12, 2022

Could this be merged? Should it be closed?

@johnjcasey
Copy link
Contributor

Run Java Precommit

@johnjcasey
Copy link
Contributor

Looks like tests are failing. @brachi-wernick could you investigate and resolve the breaks?

@github-actions
Copy link
Contributor

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 20, 2023
@github-actions
Copy link
Contributor

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants