#7978: linking ORCID from the metadata tab.#7979
Conversation
pdurbin
left a comment
There was a problem hiding this comment.
I like the simplicity of this approach. I didn't run the code but conceptually it seems to deliver one simple, extremely valuable thing: Make the ORCID a link.
I'm not clicking approve because my understanding is that @qqmyers is working on delivering similar functionality (and a whole lot more) in pull request #7946.
So we have a decision to make. Do we go ahead and test and merge this small pull request by @pkiraly? Or do we wait for the bigger pull request from @qqmyers? Either way it seems like linking to ORCIDs is coming sometime soon, which is great! 🎉
|
Thx @pdurbin for tagging @qqmyers - it'll be interesting to hear his thoughts here. If the larger PR will provide the clickable-ORCID functionality for installations that adopt the external vocabulary and also those that don't, I'm inclined to wait for the larger PR. If it will only benefit the former I think it makes sense to accept this. |
|
While #7946 will support a new single field where the value has to be an ORCID, or a compound field where the person's name and the identifier type (ORCID) are also stored (but other identifiers types aren't allowed, and affiliation isn't a child field that isn't coming from the ORCID profile), it won't manage the existing citation/author field as is without further changes/enhancements (for the issues hinted at above and some lesser technical points - probably worth discussing at some point). As a short-term solution, the one suggestion I would have is to point out that some Datasets already use the https://orcid.org/<id> form for ORCIDs (e.g. https://dataverse.nl/dataset.xhtml?persistentId=doi:10.34894/RZTSWA) so the logic should handle that. As a medium term solution/pattern I'd raise a concern that new xhtml code is needed per field per identifier type which isn't very scalable. Adding to #7946 or a solution to #7856 might avoid that. It might also be that moving some of the code to Java/slightly extending the existing display pattern language might be simpler. In any case, I think it would be worth some discussion before we accept this as another/separate standard way to display things differently from how they are stored. |
|
@qqmyers Thanks for the explanation. @djbrooke my alternative idea would be to put the code to Java class, and use method which first detect if it is a "linkable" value, that is there is a landing page for the personal identifier (ORCID, VIAF are linkable in this sense maybe even ScopusID and ResearcherID seems to be as well), and if it is linkable it provides an object represents a link. This idea could be reuse for other IDs. Next week I will check the tickets @qqmyers refers to, and if there is something reusable in this case I'll reuse it, otherwise I'll write new code. Does it sound acceptable? |
…the metadata tab.
|
@djbrooke, @qqmyers I have improved the solution, and now it is configurable, and supports not just ORCID but ISNI, VIAF, ResearcherId and ScopusID as well. There are two configuration variables: private static final Map<String, Pair<String, String>> linkComponents = Map.of(
"author", new ImmutablePair<>("authorIdentifierScheme", "authorIdentifier")
);Here one can add more compound field. The idea is that the key are the name of the parent field, and the value is a pair where the first value is the field name which contains the type or the scheme of the link (ORCID, VIAF etc.) - in this case it is the authorIdentifierScheme, and the second one contains the name of the field which holds the identifier. Later we can add more fields here. The second configuration object is map which brings together the name of the scheme, and its URL template: private static final Map<String, String> linkSchemaTemplates = Map.of(
"ORCID", "https://orcid.org/%s",
"ISNI", "https://isni.org/isni/%s",
"VIAF", "http://viaf.org/viaf/%s/",
"ResearcherID", "https://publons.com/researcher/%s/",
"ScopusID", "https://www.scopus.com/authid/detail.uri?authorId=%s"
);Later additional templates could be added. |
|
I added LCNA and GND as well, but I haven't found references to DAI, the Dutch identifier scheme. The Release notes for v4.18 provides a link to a Wikipage which unfortunately does not contain any example, and all its relevant links are dead links (in both the English and Dutch version). I asked the help of @4tikhonov, and if I know more about DAI, I can add this as well. BTW: it is not possible to search by identifier scheme in the advanced search form. |
|
Sorry for the delay! I think this looks cleaner/more scalable. I see it is still hardcoded to the author field for now but supporting more standard fields just means adding them to the map and custom fields could be allowed to do this if the map were made into a setting. (Not saying this is needed now since I don't think there are any fields structured like this except author.) The one thing I would still add is a check to see if the values already are in URL form. There are at least some ORCID ones that way and not checking means those will end up with broken links. I haven't checked w.r.t. the other identifier types, but it's probably easy to check all of them - e.g. if the value starts with the format string up to the %s, just use it as is. |
|
@qqmyers It is somewhat configurable. In the class there is a private static final Map<String, Pair<String, String>> linkComponents = Map.of(
"author", new ImmutablePair<>("authorIdentifierScheme", "authorIdentifier")
);The Related Publication's ID Type and ID Number form a similar pair, so in order to support that, we should modify this configuration as private static final Map<String, Pair<String, String>> linkComponents = Map.of(
"author", new ImmutablePair<>("authorIdentifierScheme", "authorIdentifier"),
"publication", new ImmutablePair<>("[here comes the ID type field]", "[and here is the ID number field]"),
);We do not have to modify any method in the class. Of course we cloud somehow fully externalize this configuration, e.g. storing the object into a JSON file, of we can even create an API adding other fields - this feature might be useful for some future metadata blocks. |
|
@pkiraly - I just ran across this while doing something else - does it help here? Or is it something to update (I think you have more identifier types?): |
|
@qqmyers thanks for pointing that out that getIdentifierAsUrl method. I meant to mention it! |
|
@qqmyers Thanks! I already found it, but only after I create my implementation. My problem with this approach is that it is closely coupled with the DatasetAuthor class. In order to use this, we should transform DatasetFieldCompoundValue to DatasetAuthor, and then retrieve the URL. Do you think it is the way to go? |
|
I don't know what makes the most sense, but it would be nice not to duplicate code. Would something ~simple like making that method static, with idType and idValue params, (could even let the existing method call the static one internally to avoid other changes) help? |
…r regex pattern and the link pattern.
|
@qqmyers After the summer holiday I continued the work on the PR. I modified the |
There was a problem hiding this comment.
@pkiraly I haven't run the code but I like what you're doing here. Linking to ORCIDs and other identifiers makes perfect sense.
Below I'll enumerate a few items I think we need before this goes to QA but I'd like to tag @scolapasta and @qqmyers since (as discussed in the pull request already), there's overlap with pull request #7946 (external vocab). I actually see it as an advantage that ORCIDs etc. will be links out of the box if this pull request gets merged. External vocab is an optional feature. The only concern is have is that @qqmyers might have to resolve merge conflicts, probably in src/main/webapp/metadataFragment.xhtml.
Assuming we want to go forward with this, @pkiraly can you please do the following:
- Add a release note.
- Merge the latest from develop.
- After this is done. Check that the API test suite passes. It didn't run last time (only 36 ms instead of ~8 minutes): https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-7979/5/testReport/
- In the "DAI is missing from this list" comment, clarify that DAI identifiers cannot be put into a link. (Otherwise, it looks like a TODO.)
|
@pdurbin I fixed what you suggested, exept one: I am not able to access Jenkins. I run the API tests in localhost with command (running Dataverse locally in a fresh installation). The result has been: So locally it is fine. |
poikilotherm
left a comment
There was a problem hiding this comment.
I do like the ideas manifested in code here! Thank you @pkiraly for your work on this.
Left a few ideas for further tweaks in comments.
| * GND regex from https://www.wikidata.org/wiki/Property:P227 | ||
| */ | ||
| final public static String REGEX_GND = "^1[01]?\\d{7}[0-9X]|[47]\\d{6}-\\d|[1-9]\\d{0,7}-[0-9X]|3\\d{7}[0-9X]$"; | ||
| public static final Map<String, LinkTemplate> linkSchemeTemplates = Map.of( |
There was a problem hiding this comment.
- IMHO this should be split.
- The URL pattern part should move to
Bundle.propertiesto be localizable and easier to extend/change/override. No need for a map here, just build a lookup string with a lowercased identifier type name. - The regexes should live at the validator class. IMHO it violates OOP to store them here and use over there.
There was a problem hiding this comment.
- The idea here is to keep data which logically belongs together in a single place, a DAO object called LinkTemplate. I was also thinking to move these pairs into an enumeration. The validator class
DatasetFieldValueValidatordoes not contain such data structures, and I believe it is fine that it accepts the regex from an external source (either this way or from a enum). Bundle.properties: currently none of these URLs are localizable. Once they will be, we can move them into a bundle, and reference the indentifier from here, but now I am afraid it is early optimization.
| break; | ||
| if (linkSchemeTemplates.containsKey(idType)) { | ||
| LinkTemplate template = linkSchemeTemplates.get(idType); | ||
| if (datasetFieldValueValidator.isValidAuthorIdentifier(idValue, template.getPattern())) { |
There was a problem hiding this comment.
|
|
||
| import java.util.regex.Pattern; | ||
|
|
||
| public class LinkTemplate { |
There was a problem hiding this comment.
By splitting the whole thing this class would become obsolete.
There was a problem hiding this comment.
You are right, but as I said above the intention was to keep information together.
|
|
||
| public static String getIdentifierAsUrl(String idType, String idValue) { | ||
| if (idType != null && !idType.isEmpty() && idValue != null && !idValue.isEmpty()) { | ||
| DatasetFieldValueValidator datasetFieldValueValidator = new DatasetFieldValueValidator(); |
There was a problem hiding this comment.
IMHO the validation method should be static, using a static map of precompiled patterns (see other comment, too). Less overhead. As this class is the only use of the method, we are fine to change it as needed without much noise.
There was a problem hiding this comment.
We can change the validator method to be static. And even since it is a one liner we can move the body of the code here. However my intention is to use the link checking in another compound field, the ID Type of related publications, so it might be better to change the name from isValidAuthorIdentifier to validateInput or something like that.
| private List<DatasetField> childDatasetFields = new ArrayList<>(); | ||
|
|
||
| // configurations for link creation | ||
| private static final Map<String, Pair<String, String>> linkComponents = Map.of( |
There was a problem hiding this comment.
I really like the idea!
|
|
||
| // field for handling links. Annotation '@Transient' prevents these fields to be saved in DB | ||
| @Transient | ||
| private Map<DatasetField, Boolean> linkMap = new LinkedHashMap<>(); |
There was a problem hiding this comment.
Wouldn't a List<DatasetField> do? If on the list, it's a match! 😉
There was a problem hiding this comment.
As far as I know maps are optimized for lookup values, so map.contains() faster than the list.contains().
| @Transient | ||
| private String linkScheme = null; | ||
| @Transient | ||
| private String linkValue = null; |
There was a problem hiding this comment.
Also it seems unlikely now - this is a compound field and there might be multiple links in one.
IMHO it would be better to create a Map<DatasetField,String> to save the link scheme and value.
One might argue that this could even be a suited for a inner wrapping class, holding the data and merge linkMap, linkScheme, linkValue together.
There was a problem hiding this comment.
Theoretically yes, but in practice a compound field can not have multiple links. A single author might have one name, one identifier scheme and one identifier. Multiple links would be possible of the scheme and id would have its own compound field, which would be a component of the author field.
An inner wrapping class is a good idea.
| // todo - this currently only supports child datasetfields with single values | ||
| // need to determine how we would want to handle multiple | ||
| Map<DatasetField, String> fieldMap = new LinkedHashMap<>(); | ||
| linkMap = new LinkedHashMap<>(); |
There was a problem hiding this comment.
Isn't this done already when the object is being created? If this is intentional, please use .clear() on the existing map.
| Map<DatasetField, String> fieldMap = new LinkedHashMap<>(); | ||
| linkMap = new LinkedHashMap<>(); | ||
| boolean fixTrailingComma = false; | ||
| Pair<String, String> linkComponents = getLinkComponents(); |
There was a problem hiding this comment.
Again - what happens when there are multiple links in one compound field?
There was a problem hiding this comment.
see above, it is not possible now.
|
|
||
|
|
||
| public String getLink() { | ||
| return DatasetAuthor.getIdentifierAsUrl(linkScheme, linkValue); |
There was a problem hiding this comment.
I don't really like the hardcoded behaviour here.
This should IMHO be written with extensibility in mind as already intended above by offering the Map of fields and link components.
There was a problem hiding this comment.
DatasetAuthor had already this method, and at first - not being aware of that - I created my solution which lacks the validation aspect of the ID. Since there is another compound field, which has a similar link structure (related publications), in the next round when it comes to implementing those schemas, we should move this function to a different class (either here, or into a 3rd location).
|
@pkiraly are you plannint on making changes based on @poikilotherm's feedback? If so, let us know, and we'll move this back to community dev until it's ready. |
|
@scolapasta I am planning to continue it in the second half of this week, including modify the code and resolve conflicts. I will ping you when I am done with it. |
…umeration of linked external identifier services.
….com:pkiraly/dataverse into 7978-linking-ORCID-profile-from-metadata-tab
|
@scolapasta Later than expected, but I push my changes. There was a disagreement between @poikilotherm and me in one aspect, so I did not implemented all his suggestions (placing things in separate places, e.g. in properties file), but I utilized some other hints, thanks, Oliver! I fixed the merge conflicts as well. |
|
Hey @pkiraly - thank you! I'll move this over to review and we'll take another look. |
|
@pkiraly can you get the latest from develop (now that 5.8 has been released?) |
|
@scolapasta Sure, done. |
|
@pkiraly After testing I've found that the ORCiD identifier is not made a link but three other identification schemes are: ISNI, VIAF, and ResearcherId |
|
Dear @kcondon, could you send me the ORCID value you tested? The expected value is something like 0000-0002-1825-0097. Did you put the "https://orcid.org" part as well? |
|
@pdurbin But that is fine, right? We do not want to link values which does not fit the rules. |

What this PR does / why we need it: In the dataset page on the Metadata tab Dataverse displays ORCID identifiers as other texts. Since ORCID is an identifier which has a landing page this PR displays ID as an external link to the ORCID page.
Which issue(s) this PR closes:
Closes #7978
Special notes for your reviewer: This PR does not checks the validity of the ORCID, nor takes care about other identifier schemes. If it will be accepted, this idea would be easily develop further to support other identifiers that has a landing page. The whole business logic takes place in the frontend so there is no unit test. From the perspective of this feature it is unfortunate that the compound fields are aranged in a way that it is not possible to fetch the identifier scheme easily, that's why the code stores an isOrcid variable, and not requests for it directly.
Suggestions on how to test this: create a dataset for which the author has an ORCID value, and wheck the dataset if the metadata tab links it.
Does this PR introduce a user interface change? If mockups are available, please link/include them here: It makes the ORCID identifier clickable.
Is there a release notes update needed for this change?: I don't think so.
Additional documentation: no