Conversation
|
This just has the normal review and QA left. |
qqmyers
left a comment
There was a problem hiding this comment.
Looks like it will work overall. I think there's a cut/paste error and I suggested a doc update but otherwise looks good to me. There's also some other cleanup that's not directly related to stopping where I had another question.
| sudo touch /usr/local/payara5/glassfish/domains/domain1/logs/stopharvest_bigarchive.70916 | ||
| sudo chown dataverse /usr/local/payara5/glassfish/domains/domain1/logs/stopharvest_bigarchive.70916 | ||
|
|
||
| We recommend that stop stop any running harvesting jobs using this mechanism if you need to restart the application server, otherwise the ongoing harvest will be killed, but may be left marked as if it's still in progress in the database. |
There was a problem hiding this comment.
Typo - 'stop stop' . It was also a little unclear - suggested change: 'Note: If the application server is stopped and restarted, any running harvesting jobs will be killed but may remain marked as in progress in the database. We thus recommend using the mechanism here to stop the harvesting prior to a server restart.'
There was a problem hiding this comment.
Thank you, I adopted your version almost verbatim.
| currentRun.setFailedDatasetCount(new Long(failedCount)); | ||
| currentRun.setDeletedDatasetCount(new Long(deletedCount)); | ||
| } | ||
| recordHarvestJobStatus(hcId, currentTime, harvestedCount, failedCount, deletedCount, ClientHarvestRun.RunResultType.INTERRUPTED); |
There was a problem hiding this comment.
Not sure I understand - is this a typo? the setHarvestSuccess method is using the RunResultType.INTERRUPTED?
There was a problem hiding this comment.
That was of course a typo, thank you.
(I'm working on automated tests right now that would catch this)
| + " dvobject o WHERE d.id = o.id AND o.owner_id in (" | ||
| + dvs + ")").getSingleResult(); | ||
| return (Long) em.createNativeQuery("SELECT count(d.id) FROM dataset d " | ||
| + " WHERE d.harvestingclient_id in (" |
There was a problem hiding this comment.
I can see where the join of dataset and dvobject isn't needed, but why the switch to harvestingclient_id? Was the old code just wrong or is this a typo?
There was a problem hiding this comment.
(Sorry for the delay)
The new query is simpler, and correct, all harvested datasets are uniquely tied to their specific harvesting configurations (clients).
The old query must have followed the logic from pre-Dataverse 4 times; back then harvested datasets ("studies") were similarly tied to their "harvesting dataverses". (A harvesting dataverse could only contain harvested studies, all harvested from the same place). That old query is still producing the expected results in our prod., because we still have dedicated collections for each harvested source. Strictly speaking, the dvobject-dataset joint in it was useful, although o.dtype='Dataset' would achieve the same result. But I just remembered that this was no longer required. In Dataverse 4+ it is possible to harvest into a collection that has other content - local datasets and/or datasets harvested from other places. So, shorter answer, yes, that query was wrong.
|
Test results:
|
|
Not sure if all of these problems are related to my PR. The part about the long prod. harvest not stopping with the stop file in place does sound like a real problem. So let's run that again on Monday and I'll try to diagnose what's going on. |
|
Retesting this am, found it was a user error around the stop file name syntax. Should be stopharvest. but I confusingly named my harvest client stop_harvest_demo so the correct file name given the syntax would be stopharvest.stop_harvest_demo. This worked. Testing rerun behavior, ie pause use case. It works but needed to rerun it twice, will check again. |
What this PR does / why we need it:
This provides a mechanism for a dataverse sysadmin to stop a long-running harvesting job in progress.
Which issue(s) this PR closes:
Closes #7940
Special notes for your reviewer:
Suggestions on how to test this:
will update w/ instructions
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: