Skip to content

Avoiding slow bucket acl requests#10140

Closed
ErykKul wants to merge 1 commit intoIQSS:developfrom
ErykKul:9506_slow_bucket_acl_requests
Closed

Avoiding slow bucket acl requests#10140
ErykKul wants to merge 1 commit intoIQSS:developfrom
ErykKul:9506_slow_bucket_acl_requests

Conversation

@ErykKul
Copy link
Collaborator

@ErykKul ErykKul commented Nov 24, 2023

What this PR does / why we need it:
It is only a suggestion, up for discussion. As it turns out, each time an S3AccessIO object is constructed, a call is made to verify if the corresponding bucket exists (s3.doesBucketExistV2(bucketName)). This call uses internally getBucketAcl(String), that, dependently on the s3 implementation, can be slow. For example, on our s3 the call takes longer time if the bucket contains more files. Since we keep adding files to our Dataverse, the response time grew to more than 500 miliseconds. This call is made for each dataset in the search results shown in a dataverse/collection page in the context of retrieving of the thumbnails. This happens for every dataset, regardless if the thumbnail is present or not, if not present, then thumbnail is not retrieved. Since there are 10 datasets by default in the dataverse page, 10 calls are made for a total time of 5 seconds, regardless if there are any thumbnails in these datasets or not. As a result, the dataverse page was very slow. Also, many other operations were slow, since this call is made very often. Eliminating this call made our dataverse installation much faster. We are also investigating if a caching can be enabled for these calls on the s3 side such that we won't have to rely on this pull request.

Which issue(s) this PR closes:

It is related to #9506 and to the pull request #9669

Special notes for your reviewer:
There might be a better way of addressing it, we are investigating it. Nevertheless, the check if the bucket exists looks not very usefull, since if it does not exist, retrieving files and other operations would fail anyway. Since the call takes long time in some cases, dropping it might be a good idea.

Suggestions on how to test this:
This issue is only relevant when the bucket ACL call takes long time, it might be not easy to test on a very fast s3 implementation. Alternatively, you can add sleep in the code to see the impact.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
It makes our dataverse faster ;-)

Is there a release notes update needed for this change?:
No.

Additional documentation:

@coveralls
Copy link

Coverage Status

coverage: 20.003%. remained the same
when pulling 797ebc7 on ErykKul:9506_slow_bucket_acl_requests
into 454e0bb on IQSS:develop.

@qqmyers
Copy link
Member

qqmyers commented Nov 24, 2023

FWIW - I think I have ~the same fix in #10004 which should go into 6.1.

@ErykKul
Copy link
Collaborator Author

ErykKul commented Nov 27, 2023

Great! I am closing this PR.

@ErykKul ErykKul closed this Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants