-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-5910] Add lastModified field to MatchResult.Metadata #6914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'm expecting some tests to fail initially. Seeing what fails will let me know what tests need to be updated. |
1c58a88 to
5610fbe
Compare
|
retest this please |
1 similar comment
|
retest this please |
|
Just for comment, the PR looks great, but waiting for the discussion on @dev to see what is the recommended path for the MetadataCoder update. |
|
Thanks for reviewing, @iemejia. I'm very interested to see what comes out of the discussion on evolving coders. |
|
Yes hopefully it won't take long, but probably will require a rebase (assuming that we will version the coders and save this somewhere). |
60702bd to
6d1c746
Compare
|
@iemejia - The mailing list discussion looks to have reached a conclusion that there's no short-term solution for coder versioning, so we need to be conservative about compatibility. I've added a new commit here that returns Can you take another look and see if this looks good to merge? |
|
Sure @jklukas sorry for the delay will take a look ASAP (Just back from christmass holidays). |
|
Oups just realized I reviewed the wrong one hehe never mind, will start to check this one now. Sorry. |
😆 I really appreciate the reviews, @iemejia, and I'm glad you were able to take time off over the holidays. |
|
Sorry for the delay @jklukas was a bit busy with other stuff + finishing other reviews, your PR is next in the line. |
6d1c746 to
8e9177a
Compare
iemejia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found some minor issues. Please fix and we are almost done.
Can you please add some tests (at least one) to ensure the correct behavior of coding/decoding a Metadata object with MetadataCoderV2. See CoderProperties.structuralValueDecodeEncodeEqual(CODER, value); for reference.
| .setIsReadSeekEfficient(isReadSeekEfficient) | ||
| .setSizeBytes(sizeBytes) | ||
| .build(); | ||
| .setLastModifiedMillis(UNKNOWN_LAST_MODIFIED_MILLIS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line breaks backwards compatibility (or is at least inconsistent with the encoder), we should remove this and do it only in the V2 version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is that lastModifiedMillis must have some default value. I've chosen here to use -1 so that lastModifiedMillis doesn't have to be nullable, though we can discuss if null makes more sense here.
AutoValue requires that values be set for all members. To define a default value, the AutoValue docs suggest a pattern of calling setters before returning the builder. That's what's going on here.
I don't see that there's a backwards incompatibility here. MetadataCoder still only writes and reads the three existing values (resource id, int, and long). When encoding, it throws away lastModifiedMills and when decoding, it provides the default -1 value.
Do you have thoughts on whether null would be preferred as the default vs. -1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of note, Java's File.lastModified documents that 0 is used if the file does not exist or an I/O error occurs. We could follow that and use 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we let zero as a default. In that case we won't need that and we will still be ok with AutoValue. If we setup any default I think it is better to do it at the core object level Metadata.create() better than at other places. Also in the current case if you put this only in decode then encode won't put the right value. Better to left this class as it was to avoid the risk of breaking stuff..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we let zero as a default. In that case we won't need that and we will still be ok with AutoValue
To be clear, 0 is only used as a default for File.lastModified as used in LocalFileSystem. Other filesystems do not have this same concept of 0 as default. So any way we go, we will have to explicitly set some default value for the field.
I agree with you that it's better to set the default within Metadata rather than only in the coder, so I will make that change. And will also change the default to be 0 to be consistent with File.lastModified behavior.
Also in the current case if you put this only in decode then encode won't put the right value
Exactly. encode does not encode the lastModifiedMillis value; it's thrown away. And decode provides a default. So encode and decode remain matched in the number and types of encoded values, and these are compatible with previous beam versions.
| * MetadataCoderV2} for retaining timestamp information. | ||
| */ | ||
| public class MetadataCoder extends AtomicCoder<Metadata> { | ||
| public static final long UNKNOWN_LAST_MODIFIED_MILLIS = -1L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove see comment above
sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/MetadataCoderV2.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/MetadataCoderV2.java
Outdated
Show resolved
Hide resolved
|
Fixed the typos, converted both MetadataCoder and MetadataCoderV2 to singletons, and added tests. I responded to the comment on |
0c5cb55 to
f4118ef
Compare
|
Pushed a new commit that uses |
919b0b7 to
4ae354b
Compare
4ae354b to
6bc7794
Compare
iemejia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Fantastic work @jklukas! Thanks for fixing this omission!
|
Merged manually to squash the commits. |
In the Java SDK, the Filesystems.match facilities are aimed primarily at listing file names and collect very limited additional metadata from the filesystem (sizeBytes and isReadSeekEfficient). This PR adds a new
lastModifiedfield to that list.This could be a basis for a future improvement to FileIO.match(...).continuously(...) where we could let the user opt to poll not just for new file names, but also for existing file names if their content has been updated.
In the near term, the addition of lastModified to Metadata will allow users to implement their own polling logic on top of Filesystems.match to detect and download new files from any of the supported filesystems.
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.It will help us expedite review of your Pull Request if you tag someone (e.g.
@username) to look at it.Post-Commit Tests Status (on master branch)