Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Jun 25, 2020

Add FileSystem::OpenInput{Stream,File} overrides that accept a FileInfo parameter.
This can be used to optimize file opening when it the file size and existence is already known. Concretely, avoids a HEAD request in S3.

@pitrou pitrou requested a review from bkietz June 25, 2020 15:43
@pitrou
Copy link
Member Author

pitrou commented Jun 25, 2020

@bkietz Would like your input on the dataset changes.

@github-actions
Copy link

@pitrou
Copy link
Member Author

pitrou commented Jun 25, 2020

Note to self: instead of basing the default OpenInputStream(const FileInfo&) on OpenInputStream(const std::string&), it would be more logical to do the reverse, so that no subclass needs to override both.

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataset changes look fine to me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not easily accessible from FileMetaData or so then this probably warrants a custom metadata field when writing _metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pitrou added 2 commits June 25, 2020 18:16
Add FileSystem::OpenInput{Stream,File} overrides that accept a FileInfo parameter.
This can be used to optimize file opening when it the file size and existence
is already known.  Concretely, avoids a HEAD request in S3.
@pitrou pitrou force-pushed the ARROW-8950-s3-open-file-info branch from 0004c69 to eec1e8c Compare June 25, 2020 16:21
Comment on lines 406 to 407
// Issue a HEAD Object to get the content-length and ensure any
// errors (e.g. file not found) don't wait until the first Read() call.
Copy link
Contributor

@rdettai rdettai Jun 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment got a little bit out of sync with the code below ;-)

@rdettai
Copy link
Contributor

rdettai commented Jun 26, 2020

Great! AWS is going to be surprised to see its worldwide S3 HEAD request rate drop by half overnight !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants