Skip to content

Iqss/8889 limit file pids management#9716

Merged
qqmyers merged 3 commits intoIQSS:developfrom
GlobalDataverseCommunityConsortium:IQSS/8889-limit-file-pids-management
Jul 19, 2023
Merged

Iqss/8889 limit file pids management#9716
qqmyers merged 3 commits intoIQSS:developfrom
GlobalDataverseCommunityConsortium:IQSS/8889-limit-file-pids-management

Conversation

@qqmyers
Copy link
Member

@qqmyers qqmyers commented Jul 19, 2023

What this PR does / why we need it: This PR adds restriction to who can enable file PIDs for a given collection and when. Changing the setting via the API now requires a superuser and is only allowed if the global :FilePIDsEnabled setting is set (to true or false, but not undefined/non-existent).

The PR also tightens the admin API calls to register a file, all files in a collection, or all files globally to follow the per-collection setting for file PIDs.

Docs have been updated.

Also one minor change to the batch file PID calls - the sleep now only applies when a file is registered (when the register command is called) and not for files that are skipped/in a draft version, etc.

Which issue(s) this PR closes:

Closes #8889

Special notes for your reviewer:

Suggestions on how to test this: Regression testing should assure the collection-level file PID functionality still works. The changes here can be tested by calling the change collection attributes API call to change the file pids setting - as a superuser/normal user, when the :FilePIDsEnabled setting is set (true or false)/isn't set. (The former in each pair should succeed, the latter should not.)

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?: I think the release notes are generic w.r.t. these changes, - could possibly say the new collection-level file pid functionlity is optional (versus always enabled).

Additional documentation:

@qqmyers qqmyers requested a review from scolapasta July 19, 2023 13:59
@coveralls
Copy link

Coverage Status

coverage: 20.389% (-0.003%) from 20.392% when pulling dc5d000 on GlobalDataverseCommunityConsortium:IQSS/8889-limit-file-pids-management into dde0879 on IQSS:develop.

@pdurbin pdurbin self-assigned this Jul 19, 2023
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments and questions but I'll defer to others to decide when this is ready for QA.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following API will register the PIDs for all the yet unregistered published files in the datasets **directly within the collection** specified by its alias::
The following API will register the PIDs for all the yet unregistered published files in the datasets **directly within the collection** specified by its alias.::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following API will register the PIDs for all the yet unregistered published files in the datasets **directly within the collection** specified by its alias.::
The following API will register the PIDs for all the yet unregistered published files in the datasets **directly within the collection** specified by its alias::

- :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional)
- :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional)
- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to true)
- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change of the default from false to true for :FilePIDsEnabled warrants a notice in the release notes. Otherwise, installations that have been relying on the default true behavior will stop seeing PIDs for files, right? Then need to explicitly set it to true.

(This is a better default, by the way. I agree with the change.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~Yes. We could also use flyway to change things appropriately - might still need a note somewhere but we could keep the behavior from changing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a flyaway script make sense, so that behavior stays the same, and a quick note to explain it.

I also think that installations that have it currently off should be set to null. But open to other ideas.

for (DataFile df : fileService.findAll()) {
try {
if ((df.getIdentifier() == null || df.getIdentifier().isEmpty())) {
if(!systemConfig.isFilePIDsEnabledForCollection(df.getOwner().getOwner())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the systemConfig.isFilePIDsEnabledForCollection method (not shown in this PR), should we change...

return settingsService.isTrueForKey(SettingsServiceBean.Key.FilePIDsEnabled, true);

... to...

return settingsService.isTrueForKey(SettingsServiceBean.Key.FilePIDsEnabled, false);

... to reflect that the (new) default for that setting is false rather than true?

I also noticed that in FileRecordWriter we have this:

String isFilePIDsEnabled = commandEngine.getContext().settings().getValueForKey(SettingsServiceBean.Key.FilePIDsEnabled, "true"); //default value for file PIDs is 'true'

Should that also be changed from true to false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as coded, these should flip (see other comments).

Toggles publishing of file-level PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be false. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset.

If you don't want to register file-based PIDs for your installation, set:
It is possible to override the installation-wide setting for specific collections, but only if it is set to true or false (and not left undefined). For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See :ref:`collection-attributes-api` for details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we explain why :FilePIDsEnabled has to be set to override it with a collection-specific setting?

I'm not even 100% sure I know myself. Is it that we want installations to be thoughtful about this setting? We want them to make an active decision about if they want installation-wide file PIDs or not before they start enabling file PIDs for this or that collection?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As ~coded, setting this flag turns on the ability to set filePIDs per collection and it's value is the global default. The main reason to have not set imply a global default of false and no ability to set at the collection level is to assure that no one can turn on file PIDs when the system admin doesn't expect it (and therefore incur unexpected costs). Given that the collection level change now requires a superuser, we may not need to turn off that ability completely (and we could remove code related to that). Alternately, we could split the setting into two: the global default as to whether to use file PIDs, and a second to decide whether collection-level settings are allowed. In this case, we wouldn't have to change the meaning of not having a global default, but we still could (I think we all agree it is better to start with them off).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea is preemptive for if/when we allow non superusers to turn this on.

@pdurbin pdurbin removed their assignment Jul 19, 2023
Copy link
Contributor

@scolapasta scolapasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few comments; overall looks good.

One small note to see if we care; right now if global setting is null, you cannot set the value for a collection. But if it were true or false, you could set, then set global value to null. Do we care at all? (the logic should do an "AND"* with the global value, correct?)

  • if the global value is true, can a (super)user set it to false for a collection? Or is it actally an "OR" and not an "AND"

- :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional)
- :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional)
- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to true)
- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a flyaway script make sense, so that behavior stays the same, and a quick note to explain it.

I also think that installations that have it currently off should be set to null. But open to other ideas.

- :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional)
- :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional)
- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to true)
- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just so I understand this - defaults to false, instead of null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional, if not set the default is filePIDs are not allowed (globally or per collection)

++++++++++++++++

Toggles publishing of file-level PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be true. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset.
Toggles publishing of file-level PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be false. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, shouldn't absent mean "null" instead of false? or am I just confused?

Toggles publishing of file-level PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be false. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset.

If you don't want to register file-based PIDs for your installation, set:
It is possible to override the installation-wide setting for specific collections, but only if it is set to true or false (and not left undefined). For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See :ref:`collection-attributes-api` for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea is preemptive for if/when we allow non superusers to turn this on.

@qqmyers
Copy link
Member Author

qqmyers commented Jul 19, 2023

Re: setting :FilePIDsEnabled, changing a collection, and then removing :FilePIDsEnabled - I think right now the code leaves the collection setting active, just the API to make further changes is disabled. The logic to change that would be to check that the setting is set/non-null and AND that with collection setting. (versus trying to get the boolean value for the setting and having null become false).

@qqmyers qqmyers merged commit dc5d000 into IQSS:develop Jul 19, 2023
@pdurbin pdurbin added this to the 5.14 milestone Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Enable file PIDs at Dataverse Collection Level

4 participants