Skip to content

Export: Export/reexportall appears to cause system instability on large repo #6826

@kcondon

Description

@kcondon

Recently, as part of 4.20 upgrade, reexportall was part of the release note instructions for upgrading.
Doing this in production resulted in system instability, with both glassfish nodes showing instability and the export node needing restarting after ~15k of datasets exported.
On a second try, a few days later, ~14k datasets got exported, with Dataverse freezing again (about 3k unexported published datastets remaining).

Note that in the two attempts above Dataverse was able to export roughly similar numbers of datasets before freezing.

There is a fair amount of guesswork above but this is not a new experience -export has been problematic in the past and this is the latest example. This issue is to investigate bottlenecks in performance and ideally fix or at least update guidance for us and other users.

While we're at it, please add better logging for export progress. Currently, there is a separate export log in logs. That is fine but it does not indicate how many datasets will be exported up front and so looking at it does not indicate whether it has completed, only the success or failure of individual datasets to be exported. Might be nice, like index does, to stamp server log with "exportall" started, exporting y datasets and when finished, exportall finished, x of y datasets exported. No need to stamp x of y in server log but maybe add a number to export log: exporting y datasets, then for each dataset stamp x of y there.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions