-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-8950: [C++] Avoid HEAD when possible in S3 filesystem #7547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@bkietz Would like your input on the dataset changes. |
|
Note to self: instead of basing the default |
bkietz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataset changes look fine to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not easily accessible from FileMetaData or so then this probably warrants a custom metadata field when writing _metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add FileSystem::OpenInput{Stream,File} overrides that accept a FileInfo parameter.
This can be used to optimize file opening when it the file size and existence
is already known. Concretely, avoids a HEAD request in S3.
0004c69 to
eec1e8c
Compare
| // Issue a HEAD Object to get the content-length and ensure any | ||
| // errors (e.g. file not found) don't wait until the first Read() call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment got a little bit out of sync with the code below ;-)
|
Great! AWS is going to be surprised to see its worldwide S3 HEAD request rate drop by half overnight ! |
Add FileSystem::OpenInput{Stream,File} overrides that accept a FileInfo parameter.
This can be used to optimize file opening when it the file size and existence is already known. Concretely, avoids a HEAD request in S3.