-
Notifications
You must be signed in to change notification settings - Fork 536
Description
Recently, as part of 4.20 upgrade, reexportall was part of the release note instructions for upgrading.
Doing this in production resulted in system instability, with both glassfish nodes showing instability and the export node needing restarting after ~15k of datasets exported.
On a second try, a few days later, ~14k datasets got exported, with Dataverse freezing again (about 3k unexported published datastets remaining).
Note that in the two attempts above Dataverse was able to export roughly similar numbers of datasets before freezing.
There is a fair amount of guesswork above but this is not a new experience -export has been problematic in the past and this is the latest example. This issue is to investigate bottlenecks in performance and ideally fix or at least update guidance for us and other users.
While we're at it, please add better logging for export progress. Currently, there is a separate export log in logs. That is fine but it does not indicate how many datasets will be exported up front and so looking at it does not indicate whether it has completed, only the success or failure of individual datasets to be exported. Might be nice, like index does, to stamp server log with "exportall" started, exporting y datasets and when finished, exportall finished, x of y datasets exported. No need to stamp x of y in server log but maybe add a number to export log: exporting y datasets, then for each dataset stamp x of y there.