Skip to content

#7978: linking ORCID from the metadata tab.#7979

Merged
kcondon merged 14 commits intoIQSS:developfrom
pkiraly:7978-linking-ORCID-profile-from-metadata-tab
Nov 17, 2021
Merged

#7978: linking ORCID from the metadata tab.#7979
kcondon merged 14 commits intoIQSS:developfrom
pkiraly:7978-linking-ORCID-profile-from-metadata-tab

Conversation

@pkiraly
Copy link
Member

@pkiraly pkiraly commented Jun 29, 2021

What this PR does / why we need it: In the dataset page on the Metadata tab Dataverse displays ORCID identifiers as other texts. Since ORCID is an identifier which has a landing page this PR displays ID as an external link to the ORCID page.

Which issue(s) this PR closes:

Closes #7978

Special notes for your reviewer: This PR does not checks the validity of the ORCID, nor takes care about other identifier schemes. If it will be accepted, this idea would be easily develop further to support other identifiers that has a landing page. The whole business logic takes place in the frontend so there is no unit test. From the perspective of this feature it is unfortunate that the compound fields are aranged in a way that it is not possible to fetch the identifier scheme easily, that's why the code stores an isOrcid variable, and not requests for it directly.

Suggestions on how to test this: create a dataset for which the author has an ORCID value, and wheck the dataset if the metadata tab links it.

Does this PR introduce a user interface change? If mockups are available, please link/include them here: It makes the ORCID identifier clickable.

Is there a release notes update needed for this change?: I don't think so.

Additional documentation: no

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplicity of this approach. I didn't run the code but conceptually it seems to deliver one simple, extremely valuable thing: Make the ORCID a link.

I'm not clicking approve because my understanding is that @qqmyers is working on delivering similar functionality (and a whole lot more) in pull request #7946.

So we have a decision to make. Do we go ahead and test and merge this small pull request by @pkiraly? Or do we wait for the bigger pull request from @qqmyers? Either way it seems like linking to ORCIDs is coming sometime soon, which is great! 🎉

@djbrooke
Copy link
Contributor

Thx @pdurbin for tagging @qqmyers - it'll be interesting to hear his thoughts here. If the larger PR will provide the clickable-ORCID functionality for installations that adopt the external vocabulary and also those that don't, I'm inclined to wait for the larger PR. If it will only benefit the former I think it makes sense to accept this.

@qqmyers
Copy link
Member

qqmyers commented Jul 8, 2021

While #7946 will support a new single field where the value has to be an ORCID, or a compound field where the person's name and the identifier type (ORCID) are also stored (but other identifiers types aren't allowed, and affiliation isn't a child field that isn't coming from the ORCID profile), it won't manage the existing citation/author field as is without further changes/enhancements (for the issues hinted at above and some lesser technical points - probably worth discussing at some point).

As a short-term solution, the one suggestion I would have is to point out that some Datasets already use the https://orcid.org/<id> form for ORCIDs (e.g. https://dataverse.nl/dataset.xhtml?persistentId=doi:10.34894/RZTSWA) so the logic should handle that.

As a medium term solution/pattern I'd raise a concern that new xhtml code is needed per field per identifier type which isn't very scalable. Adding to #7946 or a solution to #7856 might avoid that. It might also be that moving some of the code to Java/slightly extending the existing display pattern language might be simpler. In any case, I think it would be worth some discussion before we accept this as another/separate standard way to display things differently from how they are stored.

@djbrooke
Copy link
Contributor

djbrooke commented Jul 9, 2021

Thanks @qqmyers and @pdurbin. I'll move this back to community dev for now.

@pkiraly if you'd like to continue work on this in the near term, let us know so that we can discuss options.

@pkiraly
Copy link
Member Author

pkiraly commented Jul 9, 2021

@qqmyers Thanks for the explanation.

@djbrooke my alternative idea would be to put the code to Java class, and use method which first detect if it is a "linkable" value, that is there is a landing page for the personal identifier (ORCID, VIAF are linkable in this sense maybe even ScopusID and ResearcherID seems to be as well), and if it is linkable it provides an object represents a link. This idea could be reuse for other IDs. Next week I will check the tickets @qqmyers refers to, and if there is something reusable in this case I'll reuse it, otherwise I'll write new code. Does it sound acceptable?

@pkiraly
Copy link
Member Author

pkiraly commented Jul 13, 2021

@djbrooke, @qqmyers I have improved the solution, and now it is configurable, and supports not just ORCID but ISNI, VIAF, ResearcherId and ScopusID as well. There are two configuration variables:

private static final Map<String, Pair<String, String>> linkComponents = Map.of(
  "author", new ImmutablePair<>("authorIdentifierScheme", "authorIdentifier")
);

Here one can add more compound field. The idea is that the key are the name of the parent field, and the value is a pair where the first value is the field name which contains the type or the scheme of the link (ORCID, VIAF etc.) - in this case it is the authorIdentifierScheme, and the second one contains the name of the field which holds the identifier. Later we can add more fields here.

The second configuration object is map which brings together the name of the scheme, and its URL template:

private static final Map<String, String> linkSchemaTemplates = Map.of(
  "ORCID", "https://orcid.org/%s",
  "ISNI", "https://isni.org/isni/%s",
  "VIAF", "http://viaf.org/viaf/%s/",
  "ResearcherID", "https://publons.com/researcher/%s/",
  "ScopusID", "https://www.scopus.com/authid/detail.uri?authorId=%s"
);

Later additional templates could be added.

@pkiraly
Copy link
Member Author

pkiraly commented Jul 14, 2021

I added LCNA and GND as well, but I haven't found references to DAI, the Dutch identifier scheme. The Release notes for v4.18 provides a link to a Wikipage which unfortunately does not contain any example, and all its relevant links are dead links (in both the English and Dutch version). I asked the help of @4tikhonov, and if I know more about DAI, I can add this as well. BTW: it is not possible to search by identifier scheme in the advanced search form.

@pkiraly
Copy link
Member Author

pkiraly commented Jul 20, 2021

I received the answer regarding DAI, that it does not have a linkable resolves, so the DAI identifiers can not be put into a link the same way as the other author identifiers.

@djbrooke @qqmyers Any reflection on the approach I implemented?

@qqmyers
Copy link
Member

qqmyers commented Jul 20, 2021

Sorry for the delay!

I think this looks cleaner/more scalable. I see it is still hardcoded to the author field for now but supporting more standard fields just means adding them to the map and custom fields could be allowed to do this if the map were made into a setting. (Not saying this is needed now since I don't think there are any fields structured like this except author.)

The one thing I would still add is a check to see if the values already are in URL form. There are at least some ORCID ones that way and not checking means those will end up with broken links. I haven't checked w.r.t. the other identifier types, but it's probably easy to check all of them - e.g. if the value starts with the format string up to the %s, just use it as is.

@pkiraly
Copy link
Member Author

pkiraly commented Jul 20, 2021

@qqmyers It is somewhat configurable. In the class there is a linkComponents field, which looks like now:

private static final Map<String, Pair<String, String>> linkComponents = Map.of(
  "author", new ImmutablePair<>("authorIdentifierScheme", "authorIdentifier")
);

The Related Publication's ID Type and ID Number form a similar pair, so in order to support that, we should modify this configuration as

private static final Map<String, Pair<String, String>> linkComponents = Map.of(
  "author", new ImmutablePair<>("authorIdentifierScheme", "authorIdentifier"),
  "publication", new ImmutablePair<>("[here comes the ID type field]", "[and here is the ID number field]"),
);

We do not have to modify any method in the class. Of course we cloud somehow fully externalize this configuration, e.g. storing the object into a JSON file, of we can even create an API adding other fields - this feature might be useful for some future metadata blocks.

@qqmyers
Copy link
Member

qqmyers commented Jul 21, 2021

@pkiraly - I just ran across this while doing something else - does it help here? Or is it something to update (I think you have more identifier types?):

public String getIdentifierAsUrl() {

@pdurbin
Copy link
Member

pdurbin commented Jul 22, 2021

@qqmyers thanks for pointing that out that getIdentifierAsUrl method. I meant to mention it!

@pkiraly
Copy link
Member Author

pkiraly commented Jul 22, 2021

@qqmyers Thanks! I already found it, but only after I create my implementation. My problem with this approach is that it is closely coupled with the DatasetAuthor class. In order to use this, we should transform DatasetFieldCompoundValue to DatasetAuthor, and then retrieve the URL. Do you think it is the way to go?

@qqmyers
Copy link
Member

qqmyers commented Jul 22, 2021

I don't know what makes the most sense, but it would be nice not to duplicate code. Would something ~simple like making that method static, with idType and idValue params, (could even let the existing method call the static one internally to avoid other changes) help?
Others may have a preferred suggestion, but I'd be open to other options you can think of or, at worst, just adding notes to point out that the similar approaches exist so that both get changed if there's some future issue.

@pkiraly
Copy link
Member Author

pkiraly commented Aug 24, 2021

@qqmyers After the summer holiday I continued the work on the PR. I modified the DatasetAuthor class as well, and the tests which use it. Now the commit is bigger than a "baby step" but still believe the code changes are still understandable.

@qqmyers qqmyers removed their assignment Aug 24, 2021
@pdurbin pdurbin self-assigned this Aug 24, 2021
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pkiraly I haven't run the code but I like what you're doing here. Linking to ORCIDs and other identifiers makes perfect sense.

Below I'll enumerate a few items I think we need before this goes to QA but I'd like to tag @scolapasta and @qqmyers since (as discussed in the pull request already), there's overlap with pull request #7946 (external vocab). I actually see it as an advantage that ORCIDs etc. will be links out of the box if this pull request gets merged. External vocab is an optional feature. The only concern is have is that @qqmyers might have to resolve merge conflicts, probably in src/main/webapp/metadataFragment.xhtml.

Assuming we want to go forward with this, @pkiraly can you please do the following:

@pdurbin pdurbin assigned pkiraly and unassigned pdurbin Aug 24, 2021
@pkiraly
Copy link
Member Author

pkiraly commented Aug 25, 2021

@pdurbin I fixed what you suggested, exept one: I am not able to access Jenkins. I run the API tests in localhost with

conf/docker-aio/run-test-suite.sh http://localhost:8080

command (running Dataverse locally in a fresh installation). The result has been:

[INFO] Results:
[INFO] 
[WARNING] Tests run: 164, Failures: 0, Errors: 0, Skipped: 8

So locally it is fine.

Copy link
Contributor

@poikilotherm poikilotherm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like the ideas manifested in code here! Thank you @pkiraly for your work on this.
Left a few ideas for further tweaks in comments.

* GND regex from https://www.wikidata.org/wiki/Property:P227
*/
final public static String REGEX_GND = "^1[01]?\\d{7}[0-9X]|[47]\\d{6}-\\d|[1-9]\\d{0,7}-[0-9X]|3\\d{7}[0-9X]$";
public static final Map<String, LinkTemplate> linkSchemeTemplates = Map.of(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • IMHO this should be split.
  • The URL pattern part should move to Bundle.properties to be localizable and easier to extend/change/override. No need for a map here, just build a lookup string with a lowercased identifier type name.
  • The regexes should live at the validator class. IMHO it violates OOP to store them here and use over there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The idea here is to keep data which logically belongs together in a single place, a DAO object called LinkTemplate. I was also thinking to move these pairs into an enumeration. The validator class DatasetFieldValueValidator does not contain such data structures, and I believe it is fine that it accepts the regex from an external source (either this way or from a enum).
  • Bundle.properties: currently none of these URLs are localizable. Once they will be, we can move them into a bundle, and reference the indentifier from here, but now I am afraid it is early optimization.

break;
if (linkSchemeTemplates.containsKey(idType)) {
LinkTemplate template = linkSchemeTemplates.get(idType);
if (datasetFieldValueValidator.isValidAuthorIdentifier(idValue, template.getPattern())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written before, when the validator owns the regexes, no need to pass them around.

Using precompiled patterns in DatasetFieldValidator makes the validation faster.
See here and here for an example of the fastest regex validation variant in Java (least overhead).


import java.util.regex.Pattern;

public class LinkTemplate {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By splitting the whole thing this class would become obsolete.

Copy link
Member Author

@pkiraly pkiraly Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, but as I said above the intention was to keep information together.


public static String getIdentifierAsUrl(String idType, String idValue) {
if (idType != null && !idType.isEmpty() && idValue != null && !idValue.isEmpty()) {
DatasetFieldValueValidator datasetFieldValueValidator = new DatasetFieldValueValidator();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the validation method should be static, using a static map of precompiled patterns (see other comment, too). Less overhead. As this class is the only use of the method, we are fine to change it as needed without much noise.

Copy link
Member Author

@pkiraly pkiraly Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change the validator method to be static. And even since it is a one liner we can move the body of the code here. However my intention is to use the link checking in another compound field, the ID Type of related publications, so it might be better to change the name from isValidAuthorIdentifier to validateInput or something like that.

private List<DatasetField> childDatasetFields = new ArrayList<>();

// configurations for link creation
private static final Map<String, Pair<String, String>> linkComponents = Map.of(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea!


// field for handling links. Annotation '@Transient' prevents these fields to be saved in DB
@Transient
private Map<DatasetField, Boolean> linkMap = new LinkedHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a List<DatasetField> do? If on the list, it's a match! 😉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know maps are optimized for lookup values, so map.contains() faster than the list.contains().

Comment on lines +82 to +85
@Transient
private String linkScheme = null;
@Transient
private String linkValue = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it seems unlikely now - this is a compound field and there might be multiple links in one.
IMHO it would be better to create a Map<DatasetField,String> to save the link scheme and value.

One might argue that this could even be a suited for a inner wrapping class, holding the data and merge linkMap, linkScheme, linkValue together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically yes, but in practice a compound field can not have multiple links. A single author might have one name, one identifier scheme and one identifier. Multiple links would be possible of the scheme and id would have its own compound field, which would be a component of the author field.
An inner wrapping class is a good idea.

// todo - this currently only supports child datasetfields with single values
// need to determine how we would want to handle multiple
Map<DatasetField, String> fieldMap = new LinkedHashMap<>();
linkMap = new LinkedHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this done already when the object is being created? If this is intentional, please use .clear() on the existing map.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint!

Map<DatasetField, String> fieldMap = new LinkedHashMap<>();
linkMap = new LinkedHashMap<>();
boolean fixTrailingComma = false;
Pair<String, String> linkComponents = getLinkComponents();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again - what happens when there are multiple links in one compound field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above, it is not possible now.



public String getLink() {
return DatasetAuthor.getIdentifierAsUrl(linkScheme, linkValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like the hardcoded behaviour here.
This should IMHO be written with extensibility in mind as already intended above by offering the Map of fields and link components.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DatasetAuthor had already this method, and at first - not being aware of that - I created my solution which lacks the validation aspect of the ID. Since there is another compound field, which has a similar link structure (related publications), in the next round when it comes to implementing those schemas, we should move this function to a different class (either here, or into a 3rd location).

@scolapasta
Copy link
Contributor

@pkiraly are you plannint on making changes based on @poikilotherm's feedback? If so, let us know, and we'll move this back to community dev until it's ready.

@pkiraly
Copy link
Member Author

pkiraly commented Sep 20, 2021

@scolapasta I am planning to continue it in the second half of this week, including modify the code and resolve conflicts. I will ping you when I am done with it.

@pkiraly
Copy link
Member Author

pkiraly commented Oct 8, 2021

@scolapasta Later than expected, but I push my changes. There was a disagreement between @poikilotherm and me in one aspect, so I did not implemented all his suggestions (placing things in separate places, e.g. in properties file), but I utilized some other hints, thanks, Oliver! I fixed the merge conflicts as well.

@djbrooke
Copy link
Contributor

djbrooke commented Oct 8, 2021

Hey @pkiraly - thank you! I'll move this over to review and we'll take another look.

@scolapasta
Copy link
Contributor

@pkiraly can you get the latest from develop (now that 5.8 has been released?)

@pkiraly
Copy link
Member Author

pkiraly commented Nov 10, 2021

@scolapasta Sure, done.

@kcondon kcondon self-assigned this Nov 17, 2021
@kcondon
Copy link
Contributor

kcondon commented Nov 17, 2021

@pkiraly After testing I've found that the ORCiD identifier is not made a link but three other identification schemes are: ISNI, VIAF, and ResearcherId

@pkiraly
Copy link
Member Author

pkiraly commented Nov 17, 2021

Dear @kcondon, could you send me the ORCID value you tested? The expected value is something like 0000-0002-1825-0097. Did you put the "https://orcid.org" part as well?

@pdurbin
Copy link
Member

pdurbin commented Nov 17, 2021

From a quick test it looks like invalid values don't link:

Screen Shot 2021-11-17 at 11 16 28 AM

@pkiraly
Copy link
Member Author

pkiraly commented Nov 17, 2021

@pdurbin But that is fine, right? We do not want to link values which does not fit the rules.

@kcondon kcondon merged commit 4ba22f4 into IQSS:develop Nov 17, 2021
@pdurbin
Copy link
Member

pdurbin commented Nov 17, 2021

@pkiraly yes, makes sense. @kcondon and I talked about it and I added a line about this to the release notes: 1e66657

Thanks for the pull request! Great feature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Linking ORCID profile from the metadata tab

7 participants