Conversation
…for /api/access/dataset; but needs to be cleaned up/refactored, and we may want to take a closer look at how we actually writing the guestbookresponses more closely. #7812
…s to be redundant - ?). #7812
implements it for the local version of download-all (local, in-application zipping, w/ no external zipper configured), plus fixes the unrelated guestbook response issue.
|
There are 2 Jenkins failures; one is related to something I did in the PR - the download all is now returning "404 Not Found" if called on a non-existing dataset; when it used to return "401 Bad Request". Will fix. The other appears to be kinda random... like a timing-related 500/etc. that we get occasionally... but will confirm. |
scolapasta
left a comment
There was a problem hiding this comment.
This looks good to me. @landreev, I know you plan on removing some of the logging after QA, so when you do that can you take a look at the comment I pointed out as well?
| DataverseRequest req = createDataverseRequest(user); | ||
| final Dataset retrieved = findDatasetOrDie(datasetIdOrPersistentId); | ||
| if (!(user instanceof GuestUser)) { | ||
| // If it is a guest user, let's not even bother looking up if a draft exists. |
There was a problem hiding this comment.
this comment feels a little weird here, since it's in the if that is not guest user and does check for draft
There was a problem hiding this comment.
hmm. does it feel weird? - it basically explains why that "if" statement is there - as to not bother looking up... etc. but sure, I can change it/or take it outside the "if".
…t it was (for legacy reasons), removed the .info log messages, rewrote a comment per code review. (#7812)
|
Made a commit with the changes as discussed above. Will keep an aye on the jenkins test, will check on that other, random method (edu.harvard.iq.dataverse.api.FileMetadataIT.testJsonParserWithDirectoryLabels) that failed in the last run) |
|
That second test, testJsonParserWithDirectoryLabels, passed after I checked in the fix for the other one. But then I corrected another typo in a comment, which triggered another Jenkins run... and it failed again. So I'm going to assume that it is indeed something intermittent, and not going to sweat it. |
What this PR does / why we need it:
This eliminates a performance issue in the download-all api on datasets with large numbers of versions. It was particularly bad in a configuration with the external zipper enabled. (because it results in one combined idle delay before the redirect is issued).
Which issue(s) this PR closes:
Closes #7812
Special notes for your reviewer:
Please see the issue for more info.
Note that in the download-all-by-version API I left the commands that retrieve the dataset and the version in place (as opposed to replacing them with direct custom query methods). This is because it needs to understand a more complex logic of being able to take both the version numbers and the reserved tokens like
:draft,:latest-publishedetc... and I didn't feel like messing with it. This form of the API does not appear to be used much at all in prod. And if it does get used, this will only cost a 5-10 sec. initial delay - since these lookups only happen once, and are not repeated for every file.Suggestions on how to test this:
The best test is to replicate the scenario that was actually causing us trouble in production - a download-all via API on one of the covid-rt datasets - on a copy of the prod. db. The easiest way to measure the delay before the redirect is by testing it on the command line with curl:
the expected redirect response should look like this:
With stock 5.4.1 this should cost 100+ seconds on vm5; more in production (i.e., longer than the idle timeout on the LB). It should go down to under a couple of seconds with the fix in place.
Note that if you try to follow a redirect link like the one above on vm5, the actual download is NOT going to work. Because it's pointing to the prod. external zipper, and vm5 cannot authorize downloads there. But the problem we are fixing and testing is the initial delay, so we don't care about the actual zipping.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: