Closed
Conversation
Member
|
FWIW - I think I have ~the same fix in #10004 which should go into 6.1. |
Collaborator
Author
|
Great! I am closing this PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
It is only a suggestion, up for discussion. As it turns out, each time an S3AccessIO object is constructed, a call is made to verify if the corresponding bucket exists (
s3.doesBucketExistV2(bucketName)). This call uses internallygetBucketAcl(String), that, dependently on the s3 implementation, can be slow. For example, on our s3 the call takes longer time if the bucket contains more files. Since we keep adding files to our Dataverse, the response time grew to more than 500 miliseconds. This call is made for each dataset in the search results shown in a dataverse/collection page in the context of retrieving of the thumbnails. This happens for every dataset, regardless if the thumbnail is present or not, if not present, then thumbnail is not retrieved. Since there are 10 datasets by default in the dataverse page, 10 calls are made for a total time of 5 seconds, regardless if there are any thumbnails in these datasets or not. As a result, the dataverse page was very slow. Also, many other operations were slow, since this call is made very often. Eliminating this call made our dataverse installation much faster. We are also investigating if a caching can be enabled for these calls on the s3 side such that we won't have to rely on this pull request.Which issue(s) this PR closes:
It is related to #9506 and to the pull request #9669
Special notes for your reviewer:
There might be a better way of addressing it, we are investigating it. Nevertheless, the check if the bucket exists looks not very usefull, since if it does not exist, retrieving files and other operations would fail anyway. Since the call takes long time in some cases, dropping it might be a good idea.
Suggestions on how to test this:
This issue is only relevant when the bucket ACL call takes long time, it might be not easy to test on a very fast s3 implementation. Alternatively, you can add
sleepin the code to see the impact.Does this PR introduce a user interface change? If mockups are available, please link/include them here:
It makes our dataverse faster ;-)
Is there a release notes update needed for this change?:
No.
Additional documentation: