Skip to content

[C++] Make head optional in s3fs #25076

@asfimport

Description

@asfimport

When you open an input file with the f3fs, it issues a head request to S3 to check if the file is present/authorized and get the size (

auto outcome = client_->HeadObject(req);
).

This call comes with a non-neglictable cost:

  • adds latency

  • priced the same as a GET request by AWS

    I fail to see usecases where this call is really crucial:

  • if the file is not present/authorized, failing at first read seems to have mostly the same effect as failing on opening. I agree that it is kind of "usual" for an open call to fail eagerly, so to avoid surprises we could add a flag indicating if we don't need to fail when running OpenInputFile on an inaccessible file.

  • getting the size can be done on the first read, and could be mostly avoided on caller side if the filesystem api provided read-from-end capabilities (compatible with fs reads using ios::end and on http filesystems with bytes=-xxx). Worst case scenario the call to head could be done lazily when calling getSize().

    I agree that it makes things a bit more complex, and I understand that you would not want to complexify the generic fs api because of blob storage behavior. But obviously there are workloads where this has a significant impact.

Reporter: Rémi Dettai / @rdettai
Assignee: Antoine Pitrou / @pitrou

PRs and other links:

Note: This issue was originally created as ARROW-8950. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions