Skip to content

Conversation

@vtutrinov
Copy link
Contributor

What changes were proposed in this pull request?

Cache ozone key details for a tiny period to avoid OM spamming with equal consequent requests

If you read a huge file through s3g, requests for the same ozone key details will be sent to OM. But the response to the requests will be the same. Hence, the result of the first request could be cached, and the following requests for the key would be redirected to the cache

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11636

How was this patch tested?

A new unit-test is implemented to check that the key details will be retrieved from the cache if consequent requests are called in a short period. Existing robot tests check that the cache will be invalidated if the key/key_metadata/key_tags are updated.

@ivandika3
Copy link
Contributor

ivandika3 commented Nov 11, 2024

@vtutrinov Thanks for the patch.

While I understand the reasoning behind this change is to reduce the number of OM calls, I'm not sure about this caching solution. While single S3G will work correctly, the consistency guarantee breaks quickly if there are concurrent requests to multiple S3Gs (e.g. a reverse proxy point to multiple S3Gs). That's why S3G should be as stateless as possible.

For example, there are two S3Gs (s3g1 and s3g2) for a single "key1". If a request download the "key1" through s3g1, it will be cached in s3g1. Afterwards, another request upload "key1" with a new body through s3g2. The cache in s3g1 will not be invalidated and a download request to s3g1 will still return the stale key content. This means that there are two versions of the same object until the cache entry expires. Even worse, the data of the overwritten might already be deleted while it's still in the S3G cache (provided the cache configuration is very large and the underlying blocks have been deleted).

IMO, a possible implementation would require some kind of cache coherence implementation (witness) similar to the one shown https://www.allthingsdistributed.com/2021/04/s3-strong-consistency.html

I think we can first revisit #5565 to implement ranged GET request for a specified part on the OM-side (instead of only on S3G) so that it only returns the List<OmKeyInfoLocation> belonging to the particular part and reduce the message size returned from OM side. However, you need to first verify whether the AWS S3 SDK downloads the large MPU key using the MPU part ranged get.

@vtutrinov
Copy link
Contributor Author

@ivandika3 Thanks for the comment, sounds reasonable, will rethink a solution again and deeper

@adoroszlai adoroszlai marked this pull request as draft November 11, 2024 15:21
@ivandika3
Copy link
Contributor

ivandika3 commented Mar 30, 2025

Let's close this first since the alternative implementation of HDDS-11699 #7558 has been merged.

Introducing a caching layer to Ozone would be a large endeavor and should require a comprehensive design doc. We can explore these two systems that implement a caching layer on top of object storage

@ivandika3 ivandika3 closed this Mar 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants