3583 dv cannot parse dois that don't have a separator in authority/prefix#4651
Conversation
for this case, one can continue to append the shoulder to the Authority.
the identifier part (out of the authority side)
|
The required database update is the two lines below - not sure if the version update sql files go in git somewhere or not... UPDATE dataset SET identifier=substring(authority, strpos(authority,doiseparator)+1) || doiseparator || identifier WHERE strpos(authority,doiseparator)>0; |
always '/' 'for DOIs and Handles, and any shoulder:identifier separator is now handled as part of the separator. Also fixed the bug I introduced case logic for Handles, which are case sensitive.
first indexOf '/' only, so the restriction is now obsolete.
…-DOIs-that-don't-have-a-separator-in-authority/prefix
3583-DV-cannot-parse-DOIs-that-don't-have-a-separator-in-authority/prefix Conflicts: src/main/java/edu/harvard/iq/dataverse/Dataset.java src/main/java/edu/harvard/iq/dataverse/DatasetServiceBean.java src/main/java/edu/harvard/iq/dataverse/engine/command/impl/CreateDatasetCommand.java
Both Handles and DOIs require '/' as the authority/identifier separator. Since any separator / sub-identifier separator is now just part of the shoulder, there is no need for a configurable value for that. It is also problematic to make the authority/identifier separator configurable at this point since there are many places in the code where '/' is hardcoded (reasonable given that Handle and DOI both require it), so supporting a new Id type that doesn't use this character requires more work than creating a new serviceBean and setting the key. Nominally, all identifiers will have '/' as the separator in the database and the separator column in the dataset table (and now for datafiles too) could be removed, but that would complicate id matching relative to current code which just concatenates <authority><separator><identifier> and would add work to restore it if a different identifier with new separator is ever created. So - leaving this, and code to get/set it as is...
files Given the code for generating datafile ids, it looks like the new ability to support a shoulder should only apply to datasets as a start. This commit splits the existing IdGenerationStyle key into two one for datasets and one for datafiles. Datafiles will continue to only support random or sequential identifiers (with an additional choice as to whether to those are independent or dependent (appended to the dataset identifier) while datasets can also use the shoulder with random or sequential options. It seems like allowing files to use a shoulder when they are independent would also make sense, but I don't think using that with dependent would make as much sense. If that's a reasonable choice, we might want to rethink having the dependent/independent choice as a separate key (datafiles would ultimately have 6 choices - random or sequential with dependent/independent or independent with shoulder and random or sequential).
These updates fix an issue with checking the db for entries that match the new one so that tests include the shoulder. They also use a shoulder, if defined, for datafile, only when the ids for datafiles are being generated as 'INDEPENDENT'. When DEPENDENT, the shoulder is already used once in the part of the id coming from the dataset, so it is not used again.
Having a shoulder set will cause it to be used.
Datafile identifiers now prepend either the dataset identifier (dependent mode) or the shoulder (if defined, independent mode only) and the appropriate prepend string is sent into the appropriate generateId method so it can be prepended before the isIdentifierUniqueInDatabase call.
3583-DV-cannot-parse-DOIs-that-don't-have-a-separator-in-authority/prefix Conflicts: src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java src/main/java/edu/harvard/iq/dataverse/DatasetServiceBean.java
…n-authority/prefix' of https://github.com/QualitativeDataRepository/dataverse.git into 3583-DV-cannot-parse-DOIs-that-don't-have-a-separator-in-authority/prefix
doiShoulder is a misnomer, it is called in the generation of new identifiers, regardless of the protocol. However, it is no longer required to be '/' as the misnamed doiSeparator was. I also updated the doc to note that the shoulder is applied to identifiers, including INDEPENDENT file ids.
since it applies to generated Handles as well.
|
|
||
| --Add new setting into content for shoulder | ||
| INSERT INTO setting(name, content) | ||
| VALUES (':Shoulder', (SELECT substring(content, strpos(content,'/')+1) || '/' from setting where name = ':Authority')); |
There was a problem hiding this comment.
We have an :Authority with no shoulder and this insert sets the :Shoulder to '/' instead of just leaving it unset. If we (QDR) are the only ones (since we've already deployed the early custom shoulder code that started this PR), then this would be OK.
then re-parse it there
|
|
||
| --Add new setting into content for shoulder | ||
| INSERT INTO setting(name, content) | ||
| VALUES (':Shoulder', (SELECT substring(content, strpos(content,'/')+1) || '/' from setting where name = ':Authority')); |
There was a problem hiding this comment.
@sekmiller - I'm confused as to why you want to use doiseparator since any value other than '/' would have broken an install at least for >~4.6. Is it for earlier versions? If so, I think you'll need to restore the lines to set the doiseparator in the dvobject table that I removed this morning in my commit. (Sorry for causing confusion...)
|
After this pull request was merged, the API tests on the phoenix server started failing: #4725 |
New Contributors
Welcome! New contributors should at least glance at CONTRIBUTING.md, especially the section on pull requests where we encourage you to reach out to other developers before you start coding. Also, please note that we measure code coverage and prefer you write unit tests. Pull requests can still be reviewed without tests or completion of the checklist outlined below. Thanks!
Related Issues
Pull Request Checklist