-
Notifications
You must be signed in to change notification settings - Fork 536
Description
We have a a few issues reported about Exportall (#6826, #6594, #4682, #6302) that I'm going to close and consolidate here. There are some stack traces in those issues (especially #4682) that may be helpful as we start to investigate this. From #6826 @landreev suggests:
For the purposes of defining something we can prioritize and work on, I can think of 2 possible chilld issues that this can be split into:
-
Investigate why export is such a resource hog. The main offender appears to be full DDI export of datasets with large numbers of tabular files (with large numbers of variables). Worth checking if this is how it's always been vs. whether it's gotten worse. We've recently made some extra variable-level metadata editable; which at some point noticeably affected the performance of variable indexing. (note: it didn't make it better/faster). Is this something similar?
-
Regardless of the performance, the current reexport system is a bit unwieldy, making it hard to, for example, reexport in smaller batches. (will elaborate/add more info).
In the same issue, @kcondon asks for some additional logging:
While we're at it, please add better logging for export progress. Currently, there is a separate export log in logs. That is fine but it does not indicate how many datasets will be exported up front and so looking at it does not indicate whether it has completed, only the success or failure of individual datasets to be exported. Might be nice, like index does, to stamp server log with "exportall" started, exporting y datasets and when finished, exportall finished, x of y datasets exported. No need to stamp x of y in server log but maybe add a number to export log: exporting y datasets, then for each dataset stamp x of y there.
I think any of these three would be good issues to create and pull into a sprint, or this could be moved through as a larger issue by itself. I see more exporting in our future releases and I think it would be good to spend time on this optimization right now. Moving to Needs Discussion for @scolapasta and others to determine the best approach.