Skip to content

Comments

External Vocabulary Support#7946

Merged
kcondon merged 125 commits intoIQSS:developfrom
GlobalDataverseCommunityConsortium:external-cvoc2
Sep 14, 2021
Merged

External Vocabulary Support#7946
kcondon merged 125 commits intoIQSS:developfrom
GlobalDataverseCommunityConsortium:external-cvoc2

Conversation

@qqmyers
Copy link
Member

@qqmyers qqmyers commented Jun 15, 2021

What this PR does / why we need it: This is an extension/modification of #7712 that implements the consensus changes from the Metadata Working Group to support vocabularies from multiple services, minimize the coupling between Dataverse and the services, support single and compound fields, capture values for search, metadata export, and archival purposes in an internal table.

More details to follow. Creating this as a draft PR to simplify seeing the comparison with the dev branch.

Which issue(s) this PR closes:

Closes #7711

Special notes for your reviewer:

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

ekoi and others added 30 commits January 20, 2021 13:27
Merge branch 'develop' into external-cvoc
Fix problem where pressing tab skips some fields.
* - Add Vocabulary URL field
- Add parent term url
- Min search letter 1
- Add scrollbar for long list

* - Add Vocabulary URL field
- Add parent term url
- Min search letter 1
- Add scrollbar for long list

* Remove unused key setting.

Co-authored-by: Paul Boon <paul.boon@dans.knaw.nl>
DD-375 Disable editing of the cvoc URL fields
* Added minChars option to the cvoc configuration, default 0, ignoring negative numbers

* Added log info for minChars option

* Added option to hide cvoc URL fields when readonly and some javascript refactoring
* Uses the vocab-uri parameter from the cvoc config to store in the metadata

* Removed commented-out code fragment
Merge branch 'develop' into external-cvoc
Added the cvoc-interface.js file to the application resources
temporary measure to trigger Dataverse to not escape HTML (because
format contains '<a' )
Conflicts:
	src/main/java/edu/harvard/iq/dataverse/DatasetPage.java
json for config to simplify making changes - could switch back
@djbrooke djbrooke assigned pdurbin and unassigned qqmyers Sep 8, 2021
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't run the code this time but I'm feeling like most of my original issues have been addressed (thanks!). I did leave a couple minor new comments. As to whether to use id as a primary key, I'll defer to @scolapasta (and will assign him to this pull request).

As discussed in standup, there's a desire to spin this up so that design folks and others can take a look. For that part, I'll assign @djbrooke

Apart from those two things, I think this is ready for QA. I will note that I had trouble getting the fields working from the demo metadata block when there were four of them.

The docs here look fine but most of the complex config stuff has been put in a separate repo.

Overall, I'm excited that this external vocabularies feature is coming!

| | compound field is displayed in |
| | the UI as IMPs: A |
| | collection of NMR data. |
+-----------------------------------+-----------------------------------+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the changes to the tables above meaningful? Is the content or look changing? Or can these changes simply be ignored?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes intended. It looks like Eclipse tried to do some cleanup. That might be good but I see in one case (the fieldType param for fields) it added a new column to the table, so I'll just revert all of it since the existing stuff works.

JsonArray childFields = jo.getJsonArray("child-fields");
for (JsonString elm : childFields.getValuesAs(JsonString.class)) {
dft = findByNameOpt(elm.getString());
logger.info("Found: " + dft.getName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think logger.fine would be better here (or delete).

*/
@Entity
@Table(indexes = { @Index(columnList = "datasetfieldtype_id"), @Index(columnList = "displayorder") })
public class ExternalVocabularyValue implements Serializable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, hopefully it'll work for @kcondon .

Comment on lines 49 to 51
@Id
@Column(columnDefinition = "TEXT", nullable = false)
private String uri;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defer to @scolapasta on the consistency of using id.

@pdurbin pdurbin assigned scolapasta and djbrooke and unassigned pdurbin Sep 8, 2021
@djbrooke
Copy link
Contributor

djbrooke commented Sep 8, 2021

@TaniaSchlatter - let me know when you'd like me to spin this up. I can do as early as tomorrow (Thursday) morning.

@qqmyers
Copy link
Member Author

qqmyers commented Sep 9, 2021

W.r.t. where to make comments: This PR itself shouldn't have any UI impacts (!). If you configure to use external vocab support for specific fields, Dataverse adds invisible data-cvoc-* attributes on those fields and includes the specified Javascript(s) in pages that would use it/them. All of the UI changes should then be the result of the Javascript(s). So - if the mechanism itself doesn't work or we see log errors, etc., we should track them with the PR. The hidden attributes are added in several places - the metadata input panel, metadata display panel, facets, advanced search panel, etc. If there are more places that's needed, it would also be this PR.

For any UI issues, it is probably better to add issues at the external vocab script repo. At present, both the ORCID and SKOSMOS scripts leverage some common code and jquery 'select2' (https://select2.org/), so most comments would probably apply to both. However, some details about the displayed values are script specific (i.e. only ORCID can show the a person's name) and the details of the layout depend on whether more than one vocab is allowed (skosmos only), whether free-text entries are allowed, etc. so detailed comments should make the exact circumstances clear.

There's also an ARDC script available - slightly out of date and subject to further change, so I'd suggest only looking at the ORCID/SKOSMOS ones for now. We're also hoping to have a map-based one to handle the bounding box fields at some point - not sure that will exist before people take a look.

Also FWIW: We also have not had any discussion of connecting existing fields (e.g. keywords, topics, contacts in the citation block) to this mechanism by default, so the impact is only on custom fields at this point. (In fact, to handle a field like contacts where non-ORCID entries would be allowed and some child fields wouldn't come from ORCID will require more work on the ORCID script itself. And keywords would need to have a fourth child field added for this to work, which is not in this PR. So it isn't just a matter of adding some config info for some of the existing compound fields.)

@janvanmansum
Copy link
Contributor

@qqmyers there is a typo in the link to the external vocab script repo in your comment above. Also, I noticed that there are apparently two github organizations for GDCC ...?

@qqmyers
Copy link
Member Author

qqmyers commented Sep 11, 2021

Thanks - I fixed the original link as well. As for GDCC - we're slowly transitioning from github.com/GlobalDataverseCommunityConsortium/* to github.com/gdcc/*. Most repos can just move whenever someone has time. For the previewers one, it's a bit more complex since we serve content from the related github.io link so we need to coordinate with people using those previewers and/or run parallel repos for a while instead of just moving, etc. tbd. In any case, the gdcc repo is preferred for new work.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're good. Thanks to all who participated in the UI review on Friday. I see the fix to make the id column a long made it in. Tests failed on the most recent run but I think it was due to problems connecting to the EC2 instance? Here's the log: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-7946/30/consoleFull

I'm going to send this to QA.

@kcondon kcondon self-assigned this Sep 13, 2021
@donsizemore
Copy link
Contributor

I think we're good. Thanks to all who participated in the UI review on Friday. I see the fix to make the id column a long made it in. Tests failed on the most recent run but I think it was due to problems connecting to the EC2 instance? Here's the log: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-7946/30/consoleFull

I'm going to send this to QA.

@pdurbin The job triggered by the most recent commit passed all tests: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-7946/31/console

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature: Controlled Vocabulary Includes both Internal and external controlled vocabularies GDCC: DANS related to GDCC work for DANS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Externally controlled vocabulary support