Iqss/8889 limit file pids management#9716
Conversation
pdurbin
left a comment
There was a problem hiding this comment.
I left a few comments and questions but I'll defer to others to decide when this is ready for QA.
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| The following API will register the PIDs for all the yet unregistered published files in the datasets **directly within the collection** specified by its alias:: | ||
| The following API will register the PIDs for all the yet unregistered published files in the datasets **directly within the collection** specified by its alias.:: |
There was a problem hiding this comment.
| The following API will register the PIDs for all the yet unregistered published files in the datasets **directly within the collection** specified by its alias.:: | |
| The following API will register the PIDs for all the yet unregistered published files in the datasets **directly within the collection** specified by its alias:: |
| - :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional) | ||
| - :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional) | ||
| - :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to true) | ||
| - :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false) |
There was a problem hiding this comment.
This change of the default from false to true for :FilePIDsEnabled warrants a notice in the release notes. Otherwise, installations that have been relying on the default true behavior will stop seeing PIDs for files, right? Then need to explicitly set it to true.
(This is a better default, by the way. I agree with the change.)
There was a problem hiding this comment.
~Yes. We could also use flyway to change things appropriately - might still need a note somewhere but we could keep the behavior from changing.
There was a problem hiding this comment.
I think adding a flyaway script make sense, so that behavior stays the same, and a quick note to explain it.
I also think that installations that have it currently off should be set to null. But open to other ideas.
| for (DataFile df : fileService.findAll()) { | ||
| try { | ||
| if ((df.getIdentifier() == null || df.getIdentifier().isEmpty())) { | ||
| if(!systemConfig.isFilePIDsEnabledForCollection(df.getOwner().getOwner())) { |
There was a problem hiding this comment.
In the systemConfig.isFilePIDsEnabledForCollection method (not shown in this PR), should we change...
return settingsService.isTrueForKey(SettingsServiceBean.Key.FilePIDsEnabled, true);
... to...
return settingsService.isTrueForKey(SettingsServiceBean.Key.FilePIDsEnabled, false);
... to reflect that the (new) default for that setting is false rather than true?
I also noticed that in FileRecordWriter we have this:
String isFilePIDsEnabled = commandEngine.getContext().settings().getValueForKey(SettingsServiceBean.Key.FilePIDsEnabled, "true"); //default value for file PIDs is 'true'
Should that also be changed from true to false?
There was a problem hiding this comment.
Yes, as coded, these should flip (see other comments).
| Toggles publishing of file-level PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be false. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset. | ||
|
|
||
| If you don't want to register file-based PIDs for your installation, set: | ||
| It is possible to override the installation-wide setting for specific collections, but only if it is set to true or false (and not left undefined). For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See :ref:`collection-attributes-api` for details. |
There was a problem hiding this comment.
Should we explain why :FilePIDsEnabled has to be set to override it with a collection-specific setting?
I'm not even 100% sure I know myself. Is it that we want installations to be thoughtful about this setting? We want them to make an active decision about if they want installation-wide file PIDs or not before they start enabling file PIDs for this or that collection?
There was a problem hiding this comment.
As ~coded, setting this flag turns on the ability to set filePIDs per collection and it's value is the global default. The main reason to have not set imply a global default of false and no ability to set at the collection level is to assure that no one can turn on file PIDs when the system admin doesn't expect it (and therefore incur unexpected costs). Given that the collection level change now requires a superuser, we may not need to turn off that ability completely (and we could remove code related to that). Alternately, we could split the setting into two: the global default as to whether to use file PIDs, and a second to decide whether collection-level settings are allowed. In this case, we wouldn't have to change the meaning of not having a global default, but we still could (I think we all agree it is better to start with them off).
There was a problem hiding this comment.
Yes, the idea is preemptive for if/when we allow non superusers to turn this on.
scolapasta
left a comment
There was a problem hiding this comment.
I added a few comments; overall looks good.
One small note to see if we care; right now if global setting is null, you cannot set the value for a collection. But if it were true or false, you could set, then set global value to null. Do we care at all? (the logic should do an "AND"* with the global value, correct?)
- if the global value is true, can a (super)user set it to false for a collection? Or is it actally an "OR" and not an "AND"
| - :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional) | ||
| - :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional) | ||
| - :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to true) | ||
| - :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false) |
There was a problem hiding this comment.
I think adding a flyaway script make sense, so that behavior stays the same, and a quick note to explain it.
I also think that installations that have it currently off should be set to null. But open to other ideas.
| - :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional) | ||
| - :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional) | ||
| - :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to true) | ||
| - :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false) |
There was a problem hiding this comment.
just so I understand this - defaults to false, instead of null?
There was a problem hiding this comment.
Optional, if not set the default is filePIDs are not allowed (globally or per collection)
| ++++++++++++++++ | ||
|
|
||
| Toggles publishing of file-level PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be true. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset. | ||
| Toggles publishing of file-level PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be false. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset. |
There was a problem hiding this comment.
again, shouldn't absent mean "null" instead of false? or am I just confused?
| Toggles publishing of file-level PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be false. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset. | ||
|
|
||
| If you don't want to register file-based PIDs for your installation, set: | ||
| It is possible to override the installation-wide setting for specific collections, but only if it is set to true or false (and not left undefined). For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See :ref:`collection-attributes-api` for details. |
There was a problem hiding this comment.
Yes, the idea is preemptive for if/when we allow non superusers to turn this on.
|
Re: setting :FilePIDsEnabled, changing a collection, and then removing :FilePIDsEnabled - I think right now the code leaves the collection setting active, just the API to make further changes is disabled. The logic to change that would be to check that the setting is set/non-null and AND that with collection setting. (versus trying to get the boolean value for the setting and having null become false). |
What this PR does / why we need it: This PR adds restriction to who can enable file PIDs for a given collection and when. Changing the setting via the API now requires a superuser and is only allowed if the global :FilePIDsEnabled setting is set (to true or false, but not undefined/non-existent).
The PR also tightens the admin API calls to register a file, all files in a collection, or all files globally to follow the per-collection setting for file PIDs.
Docs have been updated.
Also one minor change to the batch file PID calls - the sleep now only applies when a file is registered (when the register command is called) and not for files that are skipped/in a draft version, etc.
Which issue(s) this PR closes:
Closes #8889
Special notes for your reviewer:
Suggestions on how to test this: Regression testing should assure the collection-level file PID functionality still works. The changes here can be tested by calling the change collection attributes API call to change the file pids setting - as a superuser/normal user, when the :FilePIDsEnabled setting is set (true or false)/isn't set. (The former in each pair should succeed, the latter should not.)
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?: I think the release notes are generic w.r.t. these changes, - could possibly say the new collection-level file pid functionlity is optional (versus always enabled).
Additional documentation: