Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions doc/sphinx-guides/source/admin/dataverses-datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -154,15 +154,18 @@ In the following example, the database id of the file is 42::

export FILE_ID=42
curl "http://localhost:8080/api/admin/$FILE_ID/registerDataFile"

This method will return a FORBIDDEN response if minting of file PIDs is not enabled for the collection the file is in. (Note that it is possible to have it enabled for a specific collection, even when it is disabled for the Dataverse installation as a whole. See :ref:`collection-attributes-api` in the Native API Guide.)

Mint PIDs for all unregistered published files in the specified collection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following API will register the PIDs for all the yet unregistered published files in the datasets **directly within the collection** specified by its alias::
The following API will register the PIDs for all the yet unregistered published files in the datasets **directly within the collection** specified by its alias.::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following API will register the PIDs for all the yet unregistered published files in the datasets **directly within the collection** specified by its alias.::
The following API will register the PIDs for all the yet unregistered published files in the datasets **directly within the collection** specified by its alias::


curl "http://localhost:8080/api/admin/registerDataFiles/{collection_alias}"

It will not attempt to register the datafiles in its sub-collections, so this call will need to be repeated on any sub-collections where files need to be registered as well. File-level PID registration must be enabled on the collection. (Note that it is possible to have it enabled for a specific collection, even when it is disabled for the Dataverse installation as a whole. See :ref:`collection-attributes-api` in the Native API Guide.)
It will not attempt to register the datafiles in its sub-collections, so this call will need to be repeated on any sub-collections where files need to be registered as well.
File-level PID registration must be enabled on the collection. (Note that it is possible to have it enabled for a specific collection, even when it is disabled for the Dataverse installation as a whole. See :ref:`collection-attributes-api` in the Native API Guide.)

This API will sleep for 1 second between registration calls by default. A longer sleep interval can be specified with an optional ``sleep=`` parameter::

Expand All @@ -171,7 +174,7 @@ This API will sleep for 1 second between registration calls by default. A longer
Mint PIDs for ALL unregistered files in the database
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following API will attempt to register the PIDs for all the published files in your instance that do not yet have them::
The following API will attempt to register the PIDs for all the published files in your instance, in collections that allow file PIDs, that do not yet have them::

curl http://localhost:8080/api/admin/registerDataFileAll

Expand Down
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -753,7 +753,7 @@ The following attributes are supported:
* ``name`` Name
* ``description`` Description
* ``affiliation`` Affiliation
* ``filePIDsEnabled`` ("true" or "false") Enables or disables registration of file-level PIDs in datasets within the collection (overriding the instance-wide setting).
* ``filePIDsEnabled`` ("true" or "false") Restricted to use by superusers and only when the global :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` is set. Enables or disables registration of file-level PIDs in datasets within the collection (overriding the instance-wide setting).


Datasets
Expand Down
16 changes: 11 additions & 5 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ this provider.
- :ref:`:Shoulder <:Shoulder>`
- :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional)
- :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional)
- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to true)
- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change of the default from false to true for :FilePIDsEnabled warrants a notice in the release notes. Otherwise, installations that have been relying on the default true behavior will stop seeing PIDs for files, right? Then need to explicitly set it to true.

(This is a better default, by the way. I agree with the change.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~Yes. We could also use flyway to change things appropriately - might still need a note somewhere but we could keep the behavior from changing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a flyaway script make sense, so that behavior stays the same, and a quick note to explain it.

I also think that installations that have it currently off should be set to null. But open to other ideas.


.. _pids-handle-configuration:

Expand Down Expand Up @@ -297,7 +297,7 @@ Here are the configuration options for PermaLinks:
- :ref:`:Shoulder <:Shoulder>`
- :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional)
- :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional)
- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to true)
- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just so I understand this - defaults to false, instead of null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional, if not set the default is filePIDs are not allowed (globally or per collection)


.. _auth-modes:

Expand Down Expand Up @@ -2775,14 +2775,20 @@ timestamps.
:FilePIDsEnabled
++++++++++++++++

Toggles publishing of file-level PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be true. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset.
Toggles publishing of file-level PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be false. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, shouldn't absent mean "null" instead of false? or am I just confused?


If you don't want to register file-based PIDs for your installation, set:
It is possible to override the installation-wide setting for specific collections, but only if it is set to true or false (and not left undefined). For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See :ref:`collection-attributes-api` for details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we explain why :FilePIDsEnabled has to be set to override it with a collection-specific setting?

I'm not even 100% sure I know myself. Is it that we want installations to be thoughtful about this setting? We want them to make an active decision about if they want installation-wide file PIDs or not before they start enabling file PIDs for this or that collection?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As ~coded, setting this flag turns on the ability to set filePIDs per collection and it's value is the global default. The main reason to have not set imply a global default of false and no ability to set at the collection level is to assure that no one can turn on file PIDs when the system admin doesn't expect it (and therefore incur unexpected costs). Given that the collection level change now requires a superuser, we may not need to turn off that ability completely (and we could remove code related to that). Alternately, we could split the setting into two: the global default as to whether to use file PIDs, and a second to decide whether collection-level settings are allowed. In this case, we wouldn't have to change the meaning of not having a global default, but we still could (I think we all agree it is better to start with them off).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea is preemptive for if/when we allow non superusers to turn this on.


To enable file-level PIDs for the entire installation::

``curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:FilePIDsEnabled``


If you don't want to register file-based PIDs for your entire installation, but do want to allow them to be enabled for a given collection set:

``curl -X PUT -d 'false' http://localhost:8080/api/admin/settings/:FilePIDsEnabled``


It is possible to override the installation-wide setting for specific collections. For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See :ref:`collection-attributes-api` for details.

.. _:IndependentHandleService:

Expand Down
22 changes: 17 additions & 5 deletions src/main/java/edu/harvard/iq/dataverse/api/Admin.java
Original file line number Diff line number Diff line change
Expand Up @@ -1514,6 +1514,9 @@ public Response registerDataFile(@Context ContainerRequestContext crc, @PathPara
User u = getRequestUser(crc);
DataverseRequest r = createDataverseRequest(u);
DataFile df = findDataFileOrDie(id);
if(!systemConfig.isFilePIDsEnabledForCollection(df.getOwner().getOwner())) {
return forbidden("PIDs are not enabled for this file's collection.");
}
if (df.getIdentifier() == null || df.getIdentifier().isEmpty()) {
execCommand(new RegisterDvObjectCommand(r, df));
} else {
Expand All @@ -1537,11 +1540,18 @@ public Response registerDataFileAll(@Context ContainerRequestContext crc) {
Integer alreadyRegistered = 0;
Integer released = 0;
Integer draft = 0;
Integer skipped = 0;
logger.info("Starting to register: analyzing " + count + " files. " + new Date());
logger.info("Only unregistered, published files will be registered.");
for (DataFile df : fileService.findAll()) {
try {
if ((df.getIdentifier() == null || df.getIdentifier().isEmpty())) {
if(!systemConfig.isFilePIDsEnabledForCollection(df.getOwner().getOwner())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the systemConfig.isFilePIDsEnabledForCollection method (not shown in this PR), should we change...

return settingsService.isTrueForKey(SettingsServiceBean.Key.FilePIDsEnabled, true);

... to...

return settingsService.isTrueForKey(SettingsServiceBean.Key.FilePIDsEnabled, false);

... to reflect that the (new) default for that setting is false rather than true?

I also noticed that in FileRecordWriter we have this:

String isFilePIDsEnabled = commandEngine.getContext().settings().getValueForKey(SettingsServiceBean.Key.FilePIDsEnabled, "true"); //default value for file PIDs is 'true'

Should that also be changed from true to false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as coded, these should flip (see other comments).

skipped++;
if (skipped % 100 == 0) {
logger.info(skipped + " of " + count + " files not in collections that allow file PIDs. " + new Date());
}
}
if (df.isReleased()) {
released++;
User u = getRequestAuthenticatedUserOrDie(crc);
Expand All @@ -1551,6 +1561,11 @@ public Response registerDataFileAll(@Context ContainerRequestContext crc) {
if (successes % 100 == 0) {
logger.info(successes + " of " + count + " files registered successfully. " + new Date());
}
try {
Thread.sleep(1000);
} catch (InterruptedException ie) {
logger.warning("Interrupted Exception when attempting to execute Thread.sleep()!");
}
} else {
draft++;
logger.info(draft + " of " + count + " files not yet published");
Expand All @@ -1567,18 +1582,15 @@ public Response registerDataFileAll(@Context ContainerRequestContext crc) {
logger.info("Unexpected Exception: " + e.getMessage());
}

try {
Thread.sleep(1000);
} catch (InterruptedException ie) {
logger.warning("Interrupted Exception when attempting to execute Thread.sleep()!");
}

}
logger.info("Final Results:");
logger.info(alreadyRegistered + " of " + count + " files were already registered. " + new Date());
logger.info(draft + " of " + count + " files are not yet published. " + new Date());
logger.info(released + " of " + count + " unregistered, published files to register. " + new Date());
logger.info(successes + " of " + released + " unregistered, published files registered successfully. "
+ new Date());
logger.info(skipped + " of " + count + " files not in collections that allow file PIDs. " + new Date());

return ok("Datafile registration complete." + successes + " of " + released
+ " unregistered, published files registered successfully.");
Expand Down
6 changes: 6 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/api/Dataverses.java
Original file line number Diff line number Diff line change
Expand Up @@ -636,6 +636,12 @@ public Response updateAttribute(@Context ContainerRequestContext crc, @PathParam
break;
*/
case "filePIDsEnabled":
if(!user.isSuperuser()) {
return forbidden("You must be a superuser to change this setting");
}
if(settingsService.getValueForKey(SettingsServiceBean.Key.FilePIDsEnabled)==null) {
return forbidden("File PIDs are not enabled on this server");
}
collection.setFilePIDsEnabled(parseBooleanOrDie(value));
break;
default:
Expand Down