Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/sphinx-guides/source/admin/harvestclients.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ Clients are managed on the "Harvesting Clients" page accessible via the :doc:`da

The process of creating a new, or editing an existing client, is largely self-explanatory. It is split into logical steps, in a way that allows the user to go back and correct the entries made earlier. The process is interactive and guidance text is provided. For example, the user is required to enter the URL of the remote OAI server. When they click *Next*, the application will try to establish a connection to the server in order to verify that it is working, and to obtain the information about the sets of metadata records and the metadata formats it supports. The choices offered to the user on the next page will be based on this extra information. If the application fails to establish a connection to the remote archive at the address specified, or if an invalid response is received, the user is given an opportunity to check and correct the URL they entered.

Note that as of 5.13, a new entry "Custom HTTP Header" has been added to the Step 1. of Create or Edit form. This optional field can be used to configure this client with a specific HTTP header to be added to every OAI request. This is to accommodate a (rare) use case where the remote server may require a special token of some kind in order to offer some content not available to other clients. Most OAI servers offer the same publicly-available content to all clients, so few admins will have a use for this feature. It is however on the very first, Step 1. screen in case the OAI server requires this token even for the "ListSets" and "ListMetadataFormats" requests, which need to be sent in the Step 2. of creating or editing a client. Multiple headers can be supplied separated by `\\n` - actual "backslash" and "n" characters, not a single "new line" character.

How to Stop a Harvesting Run in Progress
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
4 changes: 3 additions & 1 deletion doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3410,7 +3410,8 @@ The following optional fields are supported:
- archiveDescription: What the name suggests. If not supplied, will default to "This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data."
- set: The OAI set on the remote server. If not supplied, will default to none, i.e., "harvest everything".
- style: Defaults to "default" - a generic OAI archive. (Make sure to use "dataverse" when configuring harvesting from another Dataverse installation).

- customHeaders: This can be used to configure this client with a specific HTTP header that will be added to every OAI request. This is to accommodate a use case where the remote server requires this header to supply some form of a token in order to offer some content not available to other clients. See the example below. Multiple headers can be supplied separated by `\\n` - actual "backslash" and "n" characters, not a single "new line" character.

Generally, the API will accept the output of the GET version of the API for an existing client as valid input, but some fields will be ignored. For example, as of writing this there is no way to configure a harvesting schedule via this API.

An example JSON file would look like this::
Expand All @@ -3422,6 +3423,7 @@ An example JSON file would look like this::
"archiveUrl": "https://zenodo.org",
"archiveDescription": "Moissonné depuis la collection LMOPS de l'entrepôt Zenodo. En cliquant sur ce jeu de données, vous serez redirigé vers Zenodo.",
"metadataFormat": "oai_dc",
"customHeaders": "x-oai-api-key: xxxyyyzzz",
"set": "user-lmops"
}

Expand Down
7 changes: 4 additions & 3 deletions modules/dataverse-parent/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,8 @@
<apache.httpcomponents.core.version>4.4.14</apache.httpcomponents.core.version>

<!-- NEW gdcc XOAI library implementation -->
<gdcc.xoai.version>5.0.0-RC2</gdcc.xoai.version>
<!-- gdcc.xoai.version>5.0.0-RC2</gdcc.xoai.version -->
<gdcc.xoai.version>5.0.0-SNAPSHOT</gdcc.xoai.version>

<!-- Testing dependencies -->
<testcontainers.version>1.15.0</testcontainers.version>
Expand Down Expand Up @@ -324,7 +325,7 @@
<name>Local repository for hosting jars not available from network repositories.</name>
<url>file://${project.basedir}/local_lib</url>
</repository>
<!-- Uncomment when using snapshot releases from Maven Central
<!-- Uncomment when using snapshot releases from Maven Central -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you confirm that we want this uncommented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this needs to be un-commented out in order to have maven download and build with "snapshot releases", i.e. dev. builds of dependency jars. So I've enabled it temporarily in order to build with the dev. build of the XOAI libraries. As soon as this dev. version is tagged as the next point release, I will reverse it.

<repository>
<id>oss-sonatype</id>
<name>oss-sonatype</name>
Expand All @@ -335,7 +336,7 @@
<enabled>true</enabled>
</snapshots>
</repository>
-->
<!-- -->
</repositories>

<profiles>
Expand Down
51 changes: 46 additions & 5 deletions src/main/java/edu/harvard/iq/dataverse/HarvestingClientsPage.java
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
import edu.harvard.iq.dataverse.engine.command.DataverseRequest;
import edu.harvard.iq.dataverse.engine.command.exception.CommandException;
import edu.harvard.iq.dataverse.engine.command.impl.CreateHarvestingClientCommand;
import edu.harvard.iq.dataverse.engine.command.impl.DeleteHarvestingClientCommand;
import edu.harvard.iq.dataverse.engine.command.impl.UpdateHarvestingClientCommand;
import edu.harvard.iq.dataverse.harvest.client.HarvesterServiceBean;
import edu.harvard.iq.dataverse.harvest.client.HarvestingClient;
Expand All @@ -24,7 +23,6 @@
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Locale;
import java.util.Collections;
import java.util.logging.Level;
import java.util.logging.Logger;
Expand Down Expand Up @@ -79,7 +77,7 @@ public class HarvestingClientsPage implements java.io.Serializable {
private Dataverse dataverse;
private Long dataverseId = null;
private HarvestingClient selectedClient;
private boolean setListTruncated = false;
private boolean setListTruncated = false;

//private static final String solrDocIdentifierDataset = "dataset_";

Expand Down Expand Up @@ -245,6 +243,7 @@ public void editClient(HarvestingClient harvestingClient) {

this.newNickname = harvestingClient.getName();
this.newHarvestingUrl = harvestingClient.getHarvestingUrl();
this.customHeader = harvestingClient.getCustomHttpHeaders();
this.initialSettingsValidated = false;

// TODO: do we want to try and contact the server, again, to make
Expand Down Expand Up @@ -340,6 +339,7 @@ public void createClient(ActionEvent ae) {
getSelectedDestinationDataverse().getHarvestingClientConfigs().add(newHarvestingClient);

newHarvestingClient.setHarvestingUrl(newHarvestingUrl);
newHarvestingClient.setCustomHttpHeaders(customHeader);
if (!StringUtils.isEmpty(newOaiSet)) {
newHarvestingClient.setHarvestingSet(newOaiSet);
}
Expand Down Expand Up @@ -426,6 +426,7 @@ public void saveClient(ActionEvent ae) {
// nickname is not editable for existing clients:
//harvestingClient.setName(newNickname);
harvestingClient.setHarvestingUrl(newHarvestingUrl);
harvestingClient.setCustomHttpHeaders(customHeader);
harvestingClient.setHarvestingSet(newOaiSet);
harvestingClient.setMetadataPrefix(newMetadataFormat);
harvestingClient.setHarvestStyle(newHarvestingStyle);
Expand Down Expand Up @@ -554,6 +555,9 @@ public boolean validateServerUrlOAI() {
if (!StringUtils.isEmpty(getNewHarvestingUrl())) {

OaiHandler oaiHandler = new OaiHandler(getNewHarvestingUrl());
if (getNewCustomHeader() != null) {
oaiHandler.setCustomHeaders(oaiHandler.makeCustomHeaders(getNewCustomHeader()));
}
boolean success = true;
String message = null;

Expand Down Expand Up @@ -635,6 +639,23 @@ public boolean validateServerUrlOAI() {
return false;
}

public boolean validateCustomHeader() {
if (!StringUtils.isEmpty(getNewCustomHeader())) {
// TODO: put this method somewhere else as a static utility

// check that it's looking like "{header-name}: {header value}" at least
if (!Pattern.matches("^[a-zA-Z0-9\\_\\-]+:.*",getNewCustomHeader())) {
FacesContext.getCurrentInstance().addMessage(getNewClientCustomHeaderInputField().getClientId(),
new FacesMessage(FacesMessage.SEVERITY_ERROR, "", BundleUtil.getStringFromBundle("harvestclients.newClientDialog.customHeader.invalid")));

return false;
}
}

// this setting is optional
return true;
}

public void validateInitialSettings() {
if (isHarvestTypeOAI()) {
boolean nicknameValidated = true;
Expand All @@ -644,9 +665,10 @@ public void validateInitialSettings() {
destinationDataverseValidated = validateSelectedDataverse();
}
boolean urlValidated = validateServerUrlOAI();
boolean customHeaderValidated = validateCustomHeader();

if (nicknameValidated && destinationDataverseValidated && urlValidated) {
// In Create mode we want to run all 3 validation tests; this is why
if (nicknameValidated && destinationDataverseValidated && urlValidated && customHeaderValidated) {
// In Create mode we want to run all 4 validation tests; this is why
// we are not doing "if ((validateNickname() && validateServerUrlOAI())"
// in the line above. -- L.A. 4.4 May 2016.

Expand Down Expand Up @@ -688,13 +710,15 @@ public void backToStepThree() {

UIInput newClientNicknameInputField;
UIInput newClientUrlInputField;
UIInput newClientCustomHeaderInputField;
UIInput hiddenInputField;
/*UISelectOne*/ UIInput metadataFormatMenu;
UIInput remoteArchiveStyleMenu;
UIInput selectedDataverseMenu;

private String newNickname = "";
private String newHarvestingUrl = "";
private String customHeader = null;
private boolean initialSettingsValidated = false;
private String newOaiSet = "";
private String newMetadataFormat = "";
Expand All @@ -718,6 +742,7 @@ public void initNewClient(ActionEvent ae) {
//this.selectedClient = new HarvestingClient();
this.newNickname = "";
this.newHarvestingUrl = "";
this.customHeader = null;
this.initialSettingsValidated = false;
this.newOaiSet = "";
this.newMetadataFormat = "";
Expand Down Expand Up @@ -762,6 +787,14 @@ public void setNewHarvestingUrl(String newHarvestingUrl) {
this.newHarvestingUrl = newHarvestingUrl;
}

public String getNewCustomHeader() {
return customHeader;
}

public void setNewCustomHeader(String customHeader) {
this.customHeader = customHeader;
}

public int getHarvestTypeRadio() {
return this.harvestTypeRadio;
}
Expand Down Expand Up @@ -871,6 +904,14 @@ public void setNewClientUrlInputField(UIInput newClientInputField) {
this.newClientUrlInputField = newClientInputField;
}

public UIInput getNewClientCustomHeaderInputField() {
return newClientCustomHeaderInputField;
}

public void setNewClientCustomHeaderInputField(UIInput newClientInputField) {
this.newClientCustomHeaderInputField = newClientInputField;
}

public UIInput getHiddenInputField() {
return hiddenInputField;
}
Expand Down
42 changes: 10 additions & 32 deletions src/main/java/edu/harvard/iq/dataverse/api/HarvestingClients.java
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import edu.harvard.iq.dataverse.util.BundleUtil;
import edu.harvard.iq.dataverse.util.StringUtil;
import edu.harvard.iq.dataverse.util.json.JsonParseException;
import edu.harvard.iq.dataverse.util.json.JsonPrinter;
import javax.json.JsonObjectBuilder;
import static edu.harvard.iq.dataverse.util.json.NullSafeJsonBuilder.jsonObjectBuilder;
import java.io.IOException;
Expand Down Expand Up @@ -88,7 +89,7 @@ public Response harvestingClients(@QueryParam("key") String apiKey) throws IOExc
}

if (retrievedHarvestingClient != null) {
hcArr.add(harvestingConfigAsJson(retrievedHarvestingClient));
hcArr.add(JsonPrinter.json(retrievedHarvestingClient));
}
}

Expand Down Expand Up @@ -136,7 +137,7 @@ public Response harvestingClient(@PathParam("nickName") String nickName, @QueryP
}

try {
return ok(harvestingConfigAsJson(retrievedHarvestingClient));
return ok(JsonPrinter.json(retrievedHarvestingClient));
} catch (Exception ex) {
logger.warning("Unknown exception caught while trying to format harvesting client config as json: "+ex.getMessage());
return error( Response.Status.BAD_REQUEST,
Expand Down Expand Up @@ -216,7 +217,7 @@ public Response createHarvestingClient(String jsonBody, @PathParam("nickName") S

DataverseRequest req = createDataverseRequest(findUserOrDie());
harvestingClient = execCommand(new CreateHarvestingClientCommand(req, harvestingClient));
return created( "/harvest/clients/" + nickName, harvestingConfigAsJson(harvestingClient));
return created( "/harvest/clients/" + nickName, JsonPrinter.json(harvestingClient));

} catch (JsonParseException ex) {
return error( Response.Status.BAD_REQUEST, "Error parsing harvesting client: " + ex.getMessage() );
Expand Down Expand Up @@ -268,6 +269,8 @@ public Response modifyHarvestingClient(String jsonBody, @PathParam("nickName") S
}

// Go through the supported editable fields and update the client accordingly:
// TODO: We may want to reevaluate whether we really want/need *all*
// of these fields to be editable.

if (newHarvestingClient.getHarvestingUrl() != null) {
harvestingClient.setHarvestingUrl(newHarvestingClient.getHarvestingUrl());
Expand All @@ -287,10 +290,13 @@ public Response modifyHarvestingClient(String jsonBody, @PathParam("nickName") S
if (newHarvestingClient.getHarvestStyle() != null) {
harvestingClient.setHarvestStyle(newHarvestingClient.getHarvestStyle());
}
if (newHarvestingClient.getCustomHttpHeaders() != null) {
harvestingClient.setCustomHttpHeaders(newHarvestingClient.getCustomHttpHeaders());
}
// TODO: Make schedule configurable via this API too.

harvestingClient = execCommand( new UpdateHarvestingClientCommand(req, harvestingClient));
return ok( "/harvest/clients/" + nickName, harvestingConfigAsJson(harvestingClient));
return ok( "/harvest/clients/" + nickName, JsonPrinter.json(harvestingClient)); // harvestingConfigAsJson(harvestingClient));

} catch (JsonParseException ex) {
return error( Response.Status.BAD_REQUEST, "Error parsing harvesting client: " + ex.getMessage() );
Expand Down Expand Up @@ -390,32 +396,4 @@ public Response startHarvestingJob(@PathParam("nickName") String clientNickname,
}
return this.accepted();
}

/* Auxiliary, helper methods: */

public static JsonObjectBuilder harvestingConfigAsJson(HarvestingClient harvestingConfig) {
if (harvestingConfig == null) {
return null;
}


return jsonObjectBuilder().add("nickName", harvestingConfig.getName()).
add("dataverseAlias", harvestingConfig.getDataverse().getAlias()).
add("type", harvestingConfig.getHarvestType()).
add("style", harvestingConfig.getHarvestStyle()).
add("harvestUrl", harvestingConfig.getHarvestingUrl()).
add("archiveUrl", harvestingConfig.getArchiveUrl()).
add("archiveDescription",harvestingConfig.getArchiveDescription()).
add("metadataFormat", harvestingConfig.getMetadataPrefix()).
add("set", harvestingConfig.getHarvestingSet() == null ? "N/A" : harvestingConfig.getHarvestingSet()).
add("schedule", harvestingConfig.isScheduled() ? harvestingConfig.getScheduleDescription() : "none").
add("status", harvestingConfig.isHarvestingNow() ? "inProgress" : "inActive").
add("lastHarvest", harvestingConfig.getLastHarvestTime() == null ? "N/A" : harvestingConfig.getLastHarvestTime().toString()).
add("lastResult", harvestingConfig.getLastResult()).
add("lastSuccessful", harvestingConfig.getLastSuccessfulHarvestTime() == null ? "N/A" : harvestingConfig.getLastSuccessfulHarvestTime().toString()).
add("lastNonEmpty", harvestingConfig.getLastNonEmptyHarvestTime() == null ? "N/A" : harvestingConfig.getLastNonEmptyHarvestTime().toString()).
add("lastDatasetsHarvested", harvestingConfig.getLastHarvestedDatasetCount() == null ? "N/A" : harvestingConfig.getLastHarvestedDatasetCount().toString()).
add("lastDatasetsDeleted", harvestingConfig.getLastDeletedDatasetCount() == null ? "N/A" : harvestingConfig.getLastDeletedDatasetCount().toString()).
add("lastDatasetsFailed", harvestingConfig.getLastFailedDatasetCount() == null ? "N/A" : harvestingConfig.getLastFailedDatasetCount().toString());
}
}
Loading