Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -40,12 +40,13 @@ public void setId(Long id) {
this.id = id;
}

public enum RunResultType { SUCCESS, FAILURE, INPROGRESS };
public enum RunResultType { SUCCESS, RUN_FAILED, RUN_IN_PROGRESS, DELETE_FAILED };
Copy link
Contributor

@landreev landreev Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but what is the value of adding this status condition for a failure to delete a client? It has some value for a dataverse admin, to know that the last attempt to harvest from a certain server resulted in a failure. Is it really useful to know that an attempt to delete a client failed? - should the startup check simply remove the "delete in progress" flag quietly instead?
After all, seeing how the client is still there makes it somewhat clear that the attempt to get rid of it didn't work out, so they should try again?

I may be missing some situation where it could actually be useful - so I'm open to hearing it.

Copy link
Contributor Author

@JingMa87 JingMa87 Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@landreev I guess handling it quietly would be ok. So instead of DELETE FAILED or FAILED we just show a success label (those are the only options)?


private static String RESULT_LABEL_SUCCESS = "SUCCESS";
private static String RESULT_LABEL_FAILURE = "FAILED";
private static String RESULT_LABEL_INPROGRESS = "IN PROGRESS";
private static String RESULT_DELETE_IN_PROGRESS = "DELETE IN PROGRESS";
private static String RESULT_LABEL_RUN_FAILED = "RUN FAILED";
private static String RESULT_LABEL_RUN_IN_PROGRESS = "RUN IN PROGRESS";
private static String RESULT_LABEL_DELETE_IN_PROGRESS = "DELETE IN PROGRESS";
private static String RESULT_LABEL_DELETE_FAILED = "DELETE FAILED";

@ManyToOne
@JoinColumn(nullable = false)
Expand All @@ -67,34 +68,34 @@ public RunResultType getResult() {

public String getResultLabel() {
if (harvestingClient != null && harvestingClient.isDeleteInProgress()) {
return RESULT_DELETE_IN_PROGRESS;
}

if (isSuccess()) {
return RESULT_LABEL_DELETE_IN_PROGRESS;
} else if (isDeleteFailed()) {
return RESULT_LABEL_DELETE_FAILED;
} else if (isSuccess()) {
return RESULT_LABEL_SUCCESS;
} else if (isFailed()) {
return RESULT_LABEL_FAILURE;
} else if (isInProgress()) {
return RESULT_LABEL_INPROGRESS;
} else if (isRunFailed()) {
return RESULT_LABEL_RUN_FAILED;
} else if (isRunInProgress()) {
return RESULT_LABEL_RUN_IN_PROGRESS;
}
return null;
}

public String getDetailedResultLabel() {
if (harvestingClient != null && harvestingClient.isDeleteInProgress()) {
return RESULT_DELETE_IN_PROGRESS;
}
if (isSuccess()) {
return RESULT_LABEL_DELETE_IN_PROGRESS;
} else if (isDeleteFailed()) {
return RESULT_LABEL_DELETE_FAILED;
} else if (isSuccess()) {
String resultLabel = RESULT_LABEL_SUCCESS;

resultLabel = resultLabel.concat("; "+harvestedDatasetCount+" harvested, ");
resultLabel = resultLabel.concat(deletedDatasetCount+" deleted, ");
resultLabel = resultLabel.concat(failedDatasetCount+" failed.");
return resultLabel;
} else if (isFailed()) {
return RESULT_LABEL_FAILURE;
} else if (isInProgress()) {
return RESULT_LABEL_INPROGRESS;
} else if (isRunFailed()) {
return RESULT_LABEL_RUN_FAILED;
} else if (isRunInProgress()) {
return RESULT_LABEL_RUN_IN_PROGRESS;
}
return null;
}
Expand All @@ -111,21 +112,29 @@ public void setSuccess() {
harvestResult = RunResultType.SUCCESS;
}

public boolean isFailed() {
return RunResultType.FAILURE == harvestResult;
public boolean isRunFailed() {
return RunResultType.RUN_FAILED == harvestResult;
}

public void setFailed() {
harvestResult = RunResultType.FAILURE;
public void setRunFailed() {
harvestResult = RunResultType.RUN_FAILED;
}

public boolean isInProgress() {
return RunResultType.INPROGRESS == harvestResult ||
public boolean isRunInProgress() {
return RunResultType.RUN_IN_PROGRESS == harvestResult ||
(harvestResult == null && startTime != null && finishTime == null);
}

public void setInProgress() {
harvestResult = RunResultType.INPROGRESS;
public void setRunInProgress() {
harvestResult = RunResultType.RUN_IN_PROGRESS;
}

public boolean isDeleteFailed() {
return RunResultType.DELETE_FAILED == harvestResult;
}

public void setDeleteFailed() {
harvestResult = RunResultType.DELETE_FAILED;
}

// Time of this harvest attempt:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@
import edu.harvard.iq.dataverse.DataverseRequestServiceBean;
import edu.harvard.iq.dataverse.DataverseServiceBean;
import edu.harvard.iq.dataverse.EjbDataverseEngine;
import edu.harvard.iq.dataverse.engine.command.exception.CommandException;
import edu.harvard.iq.dataverse.engine.command.impl.DeleteHarvestingClientCommand;
import edu.harvard.iq.dataverse.search.IndexServiceBean;
import edu.harvard.iq.dataverse.timer.DataverseTimerServiceBean;
import java.util.ArrayList;
Expand All @@ -18,6 +16,7 @@
import javax.ejb.Stateless;
import javax.ejb.TransactionAttribute;
import javax.ejb.TransactionAttributeType;

import javax.inject.Inject;
import javax.inject.Named;
import javax.persistence.EntityManager;
Expand Down Expand Up @@ -87,9 +86,9 @@ public void resetHarvestInProgress(Long hcId) {

// And if there is an unfinished RunResult object, we'll
// just mark it as a failure:
if (harvestingClient.getLastRun() != null
&& harvestingClient.getLastRun().isInProgress()) {
harvestingClient.getLastRun().setFailed();
ClientHarvestRun lastRun = harvestingClient.getLastRun();
if (lastRun != null && lastRun.isRunInProgress()) {
lastRun.setRunFailed();
}

}
Expand All @@ -108,9 +107,22 @@ public void setHarvestInProgress(Long hcId, Date startTime) {
ClientHarvestRun currentRun = new ClientHarvestRun();
currentRun.setHarvestingClient(harvestingClient);
currentRun.setStartTime(startTime);
currentRun.setInProgress();
currentRun.setRunInProgress();
harvestingClient.getRunHistory().add(currentRun);
}

@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
public void resetDeleteInProgress(Long hcId) {
HarvestingClient harvestingClient = em.find(HarvestingClient.class, hcId);
if (harvestingClient == null) {
return;
}
em.refresh(harvestingClient);
if (harvestingClient.isDeleteInProgress()) {
harvestingClient.setDeleteInProgress(false);
harvestingClient.getLastRun().setDeleteFailed();
}
}

@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
public void setDeleteInProgress(Long hcId) {
Expand Down Expand Up @@ -175,7 +187,7 @@ public void setHarvestSuccess(Long hcId, Date currentTime, int harvestedCount, i

ClientHarvestRun currentRun = harvestingClient.getLastRun();

if (currentRun != null && currentRun.isInProgress()) {
if (currentRun != null && currentRun.isRunInProgress()) {
// TODO: what if there's no current run in progress? should we just
// give up quietly, or should we make a noise of some kind? -- L.A. 4.4

Expand All @@ -197,11 +209,11 @@ public void setHarvestFailure(Long hcId, Date currentTime) {

ClientHarvestRun currentRun = harvestingClient.getLastRun();

if (currentRun != null && currentRun.isInProgress()) {
if (currentRun != null && currentRun.isRunInProgress()) {
// TODO: what if there's no current run in progress? should we just
// give up quietly, or should we make a noise of some kind? -- L.A. 4.4

currentRun.setFailed();
currentRun.setRunFailed();
currentRun.setFinishTime(currentTime);
}
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
package edu.harvard.iq.dataverse.harvest.client;

import javax.annotation.PostConstruct;
import javax.ejb.EJB;
import javax.ejb.Singleton;
import javax.ejb.Startup;

@Singleton
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm just being paranoid but we've had trouble with Singleton beans recently. See IQSS/dataverse.harvard.edu#73

Copy link
Contributor Author

@JingMa87 JingMa87 Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the issue you linked very closely since the last thing I want to do is introduce new errors. In the issue there's an authentication bean that's a singleton and when the bean gets held up by a process, no other clients can use it to authenticate because there's only one instance. This is bad. However, in my use case the singleton is only used during the startup of Dataverse to reset any hanging harvesting clients. After that there's no use for it, so a singleton bean seems the right approach here. Moreover, in Dataverse there's many singleton beans with an additional @Startup tag to initialize a certain process and those don't seem to be causing issues either. So I'm fairly confident that a singleton won't cause issues here.

Copy link
Contributor

@landreev landreev Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not worried about it being a singleton - seeing how it has nothing in it but the @PostConstruct code.
But there's another reason why this is not going to work, specifically for us here at Harvard. We have 2 production servers. So if one of the two is being restarted, it'll reset the flags even though the actions in question may still be in progress on the other node.

It's easy to address for the most part: in a multi-server situation only one node is designated to do harvesting, and other timer-run jobs. This "master node" is the one that has the JVM option -Ddataverse.timerServer=true. And you can use if (systemConfig.isTimerServer()) to check if this node is the timer node in your code, before proceeding to clear the flags.

HOWEVER, there is an exception to the rule - when a harvesting run is started not by the timer, but manually, through the harvesting client page and/or API, the job can end up running on either of the multiple nodes.

Admittedly, this will not affect many other dataverse installations. (I'm actually not aware of another production installation at this point that is using a multiple server setup). But this would affect us, so I am a bit hesitant about doing this.
And I'm not really positive as to what the best reliable solution is. That ClientHarvestRun object would need an extra entry identifying the node actually running the harvest, maybe?

Copy link
Contributor

@landreev landreev Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another solution would be to change how the manually-run harvests are handled. I.e., instead of actually running the harvest it could create a one-time timer entry for 5 sec. from now. So it will then be picked up by the timer service on the master node - and we are guaranteed to have all the harvesting runs staying on one server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure if being able to have these harvest-related flags cleared automatically is worth adding any complicated code to address the multiple server scenario...
So yet another, much simpler, solution would be to implement it as above - but with an option to disable this mechanism; so on startup the service above would check the setting before proceeding - if (enableClearingHarvestJobsOnStartup) { ... }. It will be enabled by default; so it would work for most installations. But then in the guide it will say "make sure to disable this mechanism if you are running multiple servers!". Somewhere in the same part of the guide where we mention that dataverse.timerServersetting.

Copy link
Contributor Author

@JingMa87 JingMa87 Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@landreev I think it's best to take the simplest approach and only build in support for single server scenario's. My plan is to add a sane default of "true" for a configurable DB key "ClearingHarvestJobsOnStartupEnabled" using the SettingsServiceBean, do you have any objections?

@Startup
public class ResetHarvestInProgressBean {

@EJB
HarvestingClientServiceBean harvestingClientService;

@PostConstruct
public void init() {
for (HarvestingClient client : harvestingClientService.getAllHarvestingClients()) {
harvestingClientService.resetHarvestInProgress(client.getId());
harvestingClientService.resetDeleteInProgress(client.getId());
}
}

}