Avoiding slow bucket acl requests by ErykKul · Pull Request #10140 · IQSS/dataverse

ErykKul · 2023-11-24T15:17:26Z

What this PR does / why we need it:
It is only a suggestion, up for discussion. As it turns out, each time an S3AccessIO object is constructed, a call is made to verify if the corresponding bucket exists (s3.doesBucketExistV2(bucketName)). This call uses internally getBucketAcl(String), that, dependently on the s3 implementation, can be slow. For example, on our s3 the call takes longer time if the bucket contains more files. Since we keep adding files to our Dataverse, the response time grew to more than 500 miliseconds. This call is made for each dataset in the search results shown in a dataverse/collection page in the context of retrieving of the thumbnails. This happens for every dataset, regardless if the thumbnail is present or not, if not present, then thumbnail is not retrieved. Since there are 10 datasets by default in the dataverse page, 10 calls are made for a total time of 5 seconds, regardless if there are any thumbnails in these datasets or not. As a result, the dataverse page was very slow. Also, many other operations were slow, since this call is made very often. Eliminating this call made our dataverse installation much faster. We are also investigating if a caching can be enabled for these calls on the s3 side such that we won't have to rely on this pull request.

Which issue(s) this PR closes:

It is related to #9506 and to the pull request #9669

Special notes for your reviewer:
There might be a better way of addressing it, we are investigating it. Nevertheless, the check if the bucket exists looks not very usefull, since if it does not exist, retrieving files and other operations would fail anyway. Since the call takes long time in some cases, dropping it might be a good idea.

Suggestions on how to test this:
This issue is only relevant when the bucket ACL call takes long time, it might be not easy to test on a very fast s3 implementation. Alternatively, you can add sleep in the code to see the impact.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
It makes our dataverse faster ;-)

Is there a release notes update needed for this change?:
No.

Additional documentation:

coveralls · 2023-11-24T15:24:04Z

coverage: 20.003%. remained the same
when pulling 797ebc7 on ErykKul:9506_slow_bucket_acl_requests
into 454e0bb on IQSS:develop.

qqmyers · 2023-11-24T18:29:10Z

FWIW - I think I have ~the same fix in #10004 which should go into 6.1.

ErykKul · 2023-11-27T08:48:47Z

Great! I am closing this PR.

avoiding slow bucket acl requests

797ebc7

ErykKul closed this Nov 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoiding slow bucket acl requests#10140

Avoiding slow bucket acl requests#10140
ErykKul wants to merge 1 commit intoIQSS:developfrom
ErykKul:9506_slow_bucket_acl_requests

ErykKul commented Nov 24, 2023

Uh oh!

coveralls commented Nov 24, 2023

Uh oh!

qqmyers commented Nov 24, 2023

Uh oh!

ErykKul commented Nov 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ErykKul commented Nov 24, 2023

Uh oh!

coveralls commented Nov 24, 2023

Uh oh!

qqmyers commented Nov 24, 2023

Uh oh!

ErykKul commented Nov 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants