Skip to content

spec: implement bucket region caching#495

Merged
martindurant merged 10 commits into
fsspec:mainfrom
isidentical:s3fs-bucket-cache
Jun 18, 2021
Merged

spec: implement bucket region caching#495
martindurant merged 10 commits into
fsspec:mainfrom
isidentical:s3fs-bucket-cache

Conversation

@isidentical
Copy link
Copy Markdown
Member

Resolves #494

Comment thread s3fs/core.py
@isidentical
Copy link
Copy Markdown
Member Author

I don't really have tests for this at the moment, but I will try to come up with something (at worst just mocking, but I'd rather use some sort of real stuff, like moto).

@martindurant
Copy link
Copy Markdown
Member

I see the clear evidence from the linked issue that caching can significantly speed up calls, but can you summarise, please, why this is? My understanding is, that currently botocore will look up the region for the bucket of a given call every time; but once we know the region for some bucket, we should be able to reuse it and avoid that call. It is surprising (to me) that this could be 3x slower, but maybe it depends on where the region of the bucket is versus the s3 lookup endpoint.

Does this require a separate client for each region (or each bucket?), or could the region be specified at call time. Maybe that's actually what you do.

The case against making this default or the only path would be that HEAD_BUCKET can fail. Is the bucket region not available via an initial region-less call response's metadata?

@martindurant
Copy link
Copy Markdown
Member

I don't think moto handles regions and likely neither does minio - not that we use the latter in tests (yet #423 )

@isidentical
Copy link
Copy Markdown
Member Author

I see the clear evidence from the linked issue that caching can significantly speed up calls, but can you summarise, please, why this is?

The most basic explanation is that, every time a region is unknown for HEAD_OBJECT calls boto fails with 400 first and then goes on the route of resolving these. And when a connection gets 400, the underlying aiohttp connection gets terminated so for every call you are now paying the cost of the entire transport creation.

It is surprising (to me) that this could be 3x slower, but maybe it depends on where the region of the bucket is versus the s3 lookup endpoint

If you check the numbers of the linked issue, there is a huge gap which was netted multiple testers. And when the overall connection is getting faster, the gap gets higher.

Does this require a separate client for each region (or each bucket?), or could the region be specified at call time. Maybe that's actually what you do.

That was the first thing that I wanted to do, since client creation costs some time (0.05 seconds or something) also the experience is not great. But from what I can see, you have create a new client for each region you want to support. We do cache those clients, so if you have 6 buckets from 2 different regions, you only create 3 different clients. A generic one (to be used in HEAD_BUCKET calls), and 2 clients for 2 different regions.

The case against making this default or the only path would be that HEAD_BUCKET can fail. Is the bucket region not available via an initial region-less call response's metadata?

I'd rather separate these. I'll look into whether we can retrieve the region from an initial call, though even so I've experienced some problems on DVC side with pinning the region/signature version, some users' settings might be broken etc. Though if we could retrieve the metadata from an existing call, then we could perhaps make it just a bit more faster for the initial case. Need to look into it.

@isidentical
Copy link
Copy Markdown
Member Author

I don't think moto handles regions and likely neither does minio - not that we use the latter in tests (yet #423 )

Hmmm.

@martindurant
Copy link
Copy Markdown
Member

Though if we could retrieve the metadata from an existing call, then we could perhaps make it just a bit more faster for the initial case.

Even if not, it should make for simpler code. When debugging some future issues, it would be great not to have to ask whether region client caching was on, and whether, to their knowledge, the HEAD_BUCKET had succeeded.

By the way, is HEAD_BUCKET typically allowed for publicly accessible buckets?

@isidentical
Copy link
Copy Markdown
Member Author

By the way, is HEAD_BUCKET typically allowed for publicly accessible buckets?

AFAIK, yes, ACL public-read/public-read-write both support HEAD_BUCKET.

@isidentical
Copy link
Copy Markdown
Member Author

Even if not, it should make for simpler code. When debugging some future issues, it would be great not to have to ask whether region client caching was on, and whether, to their knowledge, the HEAD_BUCKET had succeeded.

After checking a bit, i don't think it is possible to defer this calls. At least from what I an see, we should make them initially. I'll add a bit of logging to help for better debugging and also a guard around HEAD_BUCKET to ensure that we return the generic (region unbound) client if the call fails.

@martindurant martindurant changed the title spec: implement the region caching spec: implement bucket region caching Jun 16, 2021
@isidentical
Copy link
Copy Markdown
Member Author

One idea that might make this even faster is that, permanent caching. It might sound a bit odd at the first glance, but if you think that all bucket names are globally unique, and their regions are constant (you can't change a bucket's region without actually deleting the bucket and re-creating it with the same name which is really really unlikely so I don't even see a need for that, but perhaps it might be handled in an EAFP way, like try with the known region from the permanent cache on the system and if that fails go back and resolve it again and invalidate the current cache, which would be a bit less performant for initial call on a really really rare case), this optimization really does make sense. Especially for short running processes (e.g CLI applications) which use s3fs. If they are making only a couple of info calls, then they might shave %50 of their runtime by simply enabling this option.

@martindurant
Copy link
Copy Markdown
Member

I would be fine with persisting bucket regions and invalidating on error (this should indeed be rare). Now you have to worry about writing and reading a file!

By the way, I am not advertising 2x speed (or 3x) with this improvement, but reduced latency; for cases where the data volumes are large, users may not notice the difference.

@isidentical
Copy link
Copy Markdown
Member Author

Now you have to worry about writing and reading a file!

Not really though, this is something totally optional. If the region cache is open, and if we fail to read/write to a file then it is perfectly fine. We can still have the cache in-memory by constructing on demand.

By the way, I am not advertising 2x speed (or 3x) with this improvement, but reduced latency; for cases where the data volumes are large, users may not notice the difference.

Right. As I stressed in the original issue, this is something that mainly happens on HeadObject (not even on GetObject), so if you make 2 3 HeadObject calls and upload idk 5 GiB data, then of course it is unlikely that you will notice the difference.

Copy link
Copy Markdown
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, a call to list_objects_v2 also gives the bucket region with the key 'x-amz-bucket-region', so maybe we can avoid calls to HEAD_BUCKET for cases when ls (or related) is the first call on a bucket?

Comment thread s3fs/core.py Outdated
Comment thread s3fs/utils.py Outdated
Copy link
Copy Markdown
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I am persuaded. I have a couple of small comments, but let's get this merged before release.

Comment thread s3fs/utils.py Outdated
response = await general_client.head_bucket(Bucket=bucket_name)
except ClientError:
logger.debug(
"RC: HEAD_BUCKET clal for %r has failed, returning the general client",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Comment thread s3fs/core.py
return self.url(path, expires=expiration, **kwargs)

async def _invalidate_region_cache(self):
if not self.cache_regions:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring for this

Comment thread s3fs/core.py
@martindurant martindurant merged commit 9416184 into fsspec:main Jun 18, 2021
@martindurant
Copy link
Copy Markdown
Member

OK, it's in!
This seems like the kind of thing you might well want to write an article about, since you potentially just increased the efficiency of some workloads by a factor of 3!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Speed Up exists()/info() calls up to 3X when no region is specified

2 participants