-
Notifications
You must be signed in to change notification settings - Fork 1.3k
s3, gs, azure: re-enable prefix-based search optimization #6151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -96,7 +96,8 @@ def ls(self, path_info, detail=False): | |
| files = self.fs.ls(path, detail=detail) | ||
| yield from self._strip_buckets(files, detail=detail) | ||
|
|
||
| def find(self, path_info, detail=False): | ||
| # pylint: disable=unused-argument | ||
| def find(self, path_info, detail=False, prefix=None): | ||
| path = self._with_bucket(path_info) | ||
| files = self.fs.find(path, detail=detail) | ||
| if detail: | ||
|
|
@@ -105,7 +106,7 @@ def find(self, path_info, detail=False): | |
| yield from self._strip_buckets(files, detail=detail) | ||
|
|
||
| def walk_files(self, path_info, **kwargs): | ||
| for file in self.find(path_info): | ||
| for file in self.find(path_info, **kwargs): | ||
| yield path_info.replace(path=file) | ||
|
|
||
| def remove(self, path_info): | ||
|
|
@@ -155,6 +156,8 @@ def _download( | |
|
|
||
| # pylint: disable=abstract-method | ||
| class ObjectFSWrapper(FSSpecWrapper): | ||
| TRAVERSE_PREFIX_LEN = 3 | ||
|
|
||
| def _isdir(self, path_info): | ||
| # Directory in object storages are interpreted differently | ||
| # among different fsspec providers, so this logic is a temporary | ||
|
|
@@ -169,9 +172,16 @@ def _isdir(self, path_info): | |
| and entry["name"].endswith("/") | ||
| ) | ||
|
|
||
| def find(self, path_info, detail=False): | ||
| path = self._with_bucket(path_info) | ||
| files = self.fs.find(path, detail=detail) | ||
| def find(self, path_info, detail=False, prefix=None): | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be a nice idea (for future refactors of remotes) to pass the prefix in here (
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it would make more sense for us to do instead of the current |
||
| if prefix is not None: | ||
| path = self._with_bucket(path_info.parent) | ||
| files = self.fs.find( | ||
| path, detail=detail, prefix=path_info.parts[-1] | ||
| ) | ||
| else: | ||
| path = self._with_bucket(path_info) | ||
| files = self.fs.find(path, detail=detail) | ||
|
|
||
| if detail: | ||
| files = files.values() | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know if there is a scenario where the
prefixwill not beNonefor this case (TRAVERSE_PREFIX_LEN=2, used by filesystems with real dirs etc) but if there is no such case, then perhaps we could add anassert prefix is Noneas a safe-guard.