-
Notifications
You must be signed in to change notification settings - Fork 536
Iqss/8889 limit file pids management #9716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -248,7 +248,7 @@ this provider. | |
| - :ref:`:Shoulder <:Shoulder>` | ||
| - :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional) | ||
| - :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional) | ||
| - :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to true) | ||
| - :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This change of the default from false to true for (This is a better default, by the way. I agree with the change.)
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ~Yes. We could also use flyway to change things appropriately - might still need a note somewhere but we could keep the behavior from changing.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think adding a flyaway script make sense, so that behavior stays the same, and a quick note to explain it. I also think that installations that have it currently off should be set to null. But open to other ideas. |
||
|
|
||
| .. _pids-handle-configuration: | ||
|
|
||
|
|
@@ -297,7 +297,7 @@ Here are the configuration options for PermaLinks: | |
| - :ref:`:Shoulder <:Shoulder>` | ||
| - :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional) | ||
| - :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional) | ||
| - :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to true) | ||
| - :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just so I understand this - defaults to false, instead of null?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Optional, if not set the default is filePIDs are not allowed (globally or per collection) |
||
|
|
||
| .. _auth-modes: | ||
|
|
||
|
|
@@ -2775,14 +2775,20 @@ timestamps. | |
| :FilePIDsEnabled | ||
| ++++++++++++++++ | ||
|
|
||
| Toggles publishing of file-level PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be true. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset. | ||
| Toggles publishing of file-level PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be false. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. again, shouldn't absent mean "null" instead of false? or am I just confused? |
||
|
|
||
| If you don't want to register file-based PIDs for your installation, set: | ||
| It is possible to override the installation-wide setting for specific collections, but only if it is set to true or false (and not left undefined). For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See :ref:`collection-attributes-api` for details. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we explain why I'm not even 100% sure I know myself. Is it that we want installations to be thoughtful about this setting? We want them to make an active decision about if they want installation-wide file PIDs or not before they start enabling file PIDs for this or that collection?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As ~coded, setting this flag turns on the ability to set filePIDs per collection and it's value is the global default. The main reason to have not set imply a global default of false and no ability to set at the collection level is to assure that no one can turn on file PIDs when the system admin doesn't expect it (and therefore incur unexpected costs). Given that the collection level change now requires a superuser, we may not need to turn off that ability completely (and we could remove code related to that). Alternately, we could split the setting into two: the global default as to whether to use file PIDs, and a second to decide whether collection-level settings are allowed. In this case, we wouldn't have to change the meaning of not having a global default, but we still could (I think we all agree it is better to start with them off).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the idea is preemptive for if/when we allow non superusers to turn this on. |
||
|
|
||
| To enable file-level PIDs for the entire installation:: | ||
|
|
||
| ``curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:FilePIDsEnabled`` | ||
|
|
||
|
|
||
| If you don't want to register file-based PIDs for your entire installation, but do want to allow them to be enabled for a given collection set: | ||
|
|
||
| ``curl -X PUT -d 'false' http://localhost:8080/api/admin/settings/:FilePIDsEnabled`` | ||
|
|
||
|
|
||
| It is possible to override the installation-wide setting for specific collections. For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See :ref:`collection-attributes-api` for details. | ||
|
|
||
| .. _:IndependentHandleService: | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1514,6 +1514,9 @@ public Response registerDataFile(@Context ContainerRequestContext crc, @PathPara | |
| User u = getRequestUser(crc); | ||
| DataverseRequest r = createDataverseRequest(u); | ||
| DataFile df = findDataFileOrDie(id); | ||
| if(!systemConfig.isFilePIDsEnabledForCollection(df.getOwner().getOwner())) { | ||
| return forbidden("PIDs are not enabled for this file's collection."); | ||
| } | ||
| if (df.getIdentifier() == null || df.getIdentifier().isEmpty()) { | ||
| execCommand(new RegisterDvObjectCommand(r, df)); | ||
| } else { | ||
|
|
@@ -1537,11 +1540,18 @@ public Response registerDataFileAll(@Context ContainerRequestContext crc) { | |
| Integer alreadyRegistered = 0; | ||
| Integer released = 0; | ||
| Integer draft = 0; | ||
| Integer skipped = 0; | ||
| logger.info("Starting to register: analyzing " + count + " files. " + new Date()); | ||
| logger.info("Only unregistered, published files will be registered."); | ||
| for (DataFile df : fileService.findAll()) { | ||
| try { | ||
| if ((df.getIdentifier() == null || df.getIdentifier().isEmpty())) { | ||
| if(!systemConfig.isFilePIDsEnabledForCollection(df.getOwner().getOwner())) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the systemConfig.isFilePIDsEnabledForCollection method (not shown in this PR), should we change... ... to... ... to reflect that the (new) default for that setting is false rather than true? I also noticed that in FileRecordWriter we have this:
Should that also be changed from true to false?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, as coded, these should flip (see other comments). |
||
| skipped++; | ||
| if (skipped % 100 == 0) { | ||
| logger.info(skipped + " of " + count + " files not in collections that allow file PIDs. " + new Date()); | ||
| } | ||
| } | ||
| if (df.isReleased()) { | ||
| released++; | ||
| User u = getRequestAuthenticatedUserOrDie(crc); | ||
|
|
@@ -1551,6 +1561,11 @@ public Response registerDataFileAll(@Context ContainerRequestContext crc) { | |
| if (successes % 100 == 0) { | ||
| logger.info(successes + " of " + count + " files registered successfully. " + new Date()); | ||
| } | ||
| try { | ||
| Thread.sleep(1000); | ||
| } catch (InterruptedException ie) { | ||
| logger.warning("Interrupted Exception when attempting to execute Thread.sleep()!"); | ||
| } | ||
| } else { | ||
| draft++; | ||
| logger.info(draft + " of " + count + " files not yet published"); | ||
|
|
@@ -1567,18 +1582,15 @@ public Response registerDataFileAll(@Context ContainerRequestContext crc) { | |
| logger.info("Unexpected Exception: " + e.getMessage()); | ||
| } | ||
|
|
||
| try { | ||
| Thread.sleep(1000); | ||
| } catch (InterruptedException ie) { | ||
| logger.warning("Interrupted Exception when attempting to execute Thread.sleep()!"); | ||
| } | ||
|
|
||
| } | ||
| logger.info("Final Results:"); | ||
| logger.info(alreadyRegistered + " of " + count + " files were already registered. " + new Date()); | ||
| logger.info(draft + " of " + count + " files are not yet published. " + new Date()); | ||
| logger.info(released + " of " + count + " unregistered, published files to register. " + new Date()); | ||
| logger.info(successes + " of " + released + " unregistered, published files registered successfully. " | ||
| + new Date()); | ||
| logger.info(skipped + " of " + count + " files not in collections that allow file PIDs. " + new Date()); | ||
|
|
||
| return ok("Datafile registration complete." + successes + " of " + released | ||
| + " unregistered, published files registered successfully."); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.