Skip to content

Conversation

@danielcweeks
Copy link
Contributor

This reverts commit c882ac9.

@jackye1995 I wanted to propose reverting #1863 and going back to the HEAD operation for exists() and contentLength() FileIO operations. There was some discussion on the dev list and apparently there's some level of confirmation that negative caching with HEAD operations is covered (and possibly strong consistency?) in the recent announcement around S3 consistency.

Iceberg isn't using this operation for consistency and the major concern I have is that the LIST operation cost appears to be 10X (or more) than the HEAD operation, which could really add up depending on usage.

I just wanted to put this up for discussion and see if there's any real advantage at this point to using LIST.

(cc @giovannifumarola, @kbendick, @massdosage and @rdblue since you were all on the original thread)

@github-actions github-actions bot added the AWS label Dec 16, 2020
@jackye1995
Copy link
Contributor

There are also quite a few confusions about this internally, let me ask s3 directly

@jackye1995
Copy link
Contributor

Cool, the S3 team confirmed that all read operations including HEAD is now strongly consistent, so this can be reverted, sorry for the confusion.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Dec 16, 2020

Thanks for confirming the fact that negative cache is gone.

That said, would you mind if I ask about the documentation of aws module, especially mentioning the rationalization (benefits) of the existence of the module? (Say, you'll want to use the aws module instead of simply using Hadoop API on S3 support because...)

@danielcweeks
Copy link
Contributor Author

Thanks for confirming the fact that negative cache is gone.

That said, would you mind if I ask about the documentation of aws module, especially mentioning the rationalization (benefits) of the existence of the module? (Say, you'll want to use the aws module instead of simply using Hadoop API on S3 support because...)

@HeartSaVioR good question, but I think that's probably beyond the scope for this specific PR, but would be great to raise in #1891 which is more documentation related (and may be getting at your exact question).

@rdblue rdblue merged commit 372fbcf into apache:master Dec 17, 2020
@rdblue
Copy link
Contributor

rdblue commented Dec 17, 2020

Great to hear that HEAD requests are consistent! Thanks @jackye1995 and @danielcweeks!

@HeartSaVioR
Copy link
Contributor

@danielcweeks Sorry I missed there's already PR submitted. Thanks for the reference!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants