-
Notifications
You must be signed in to change notification settings - Fork 3k
AWS: support S3 strong consistency #1863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It looks like it should be possible to use just a LIST call here because we don't need the full metadata from the HEAD request. Metadata is only used for the existence check and for the file size, which should be available from the list response: // list object to force strong consistency
ListObjectsV2Response response = client().listObjectsV2(ListObjectsV2Request.builder()
.bucket(uri().bucket())
.prefix(uri().key())
.build());
Preconditions.checkState(response.contents().size() <= 1, "Invalid file: %s contains more than one object", uri);
this.exists = response.contents().size() > 0;
this.length = response.contents().get(0).size(); |
|
I agree with @rdblue's comment that we probably don't want to do two requests and LIST alone would be sufficient. Also, I'm not clear on how forcing a LIST would ensure correctness for a following head call. Based on the docs I didn't see anything about that sequencing resulting in HEAD consistency. |
Thanks for the suggestion, did not know LIST also gets the length metadata. |
| .prefix(uri().key()) | ||
| .maxKeys(1) | ||
| .build()); | ||
| metadata = response.hasContents() ? response.contents().get(0) : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this changes the exception behavior a little bit because now S3InputFile.getLength() will throw NPE as opposed to S3 NOT_FOUND in the event it does not exist and getLength() is called. Not a huge issue, but we might consider improving that a little.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was going to ask, I saw in HadoopInputFile it simply throws a RuntimeIOException when asking for length of a file not exist. Probably better to check existence and throw NotFoundException in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, NotFoundException would be much better.
kbendick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to test for the contents of a file change after put (and not just the length being the same like is currently tested)?
What situation does it try to cover in the context of this PR? |
|
Thanks @jackye1995 |
* AWS: support S3 strong consistency * remove call to HEAD * fix list wrong file with the same suffix * check file exists when getting length
* AWS: support S3 strong consistency * remove call to HEAD * fix list wrong file with the same suffix * check file exists when getting length
This reverts commit c882ac9.
AWS released S3 strong consistency, and now all LIST and GET are strongly consistent. So we can use a LIST to force strong consistency before performing a HEAD for
exists()check.More details: https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/