-
Notifications
You must be signed in to change notification settings - Fork 535
Description
EDIT: I decided to update this proposal for the Flexible Metadata Session 2 of the Community Meeting on 19 June 2020, even though it has already been closed. Please NOTE that this issue will stay CLOSED and its sole purpose is to explain the concept as an input for the Flexible Metadata session at the Community Meeting and the generalized effort in #6030.
Here the link to my slides for the Flexible Metadata session:
Kahle, Jonas. (2020, June). Dataverse Community Meeting 2020 - Flexible Metadata Session 2 - Autocompleting and referencing data from REST APIs (Version 1.1). Zenodo. http://doi.org/10.5281/zenodo.3901395
This is a proposal for a generalized system to reference data from external API sources in the metadata, which could be the (or part of the) solution to #4282 and #3622 and [probably more]. The preferred way to select the value would be an autocomplete function as described in #2603.
In some cases it might be nice to be able to select multiple values, as requested in #350.
Dataverse would need to save information about the used identifier (scheme|identifier of the value) and the metadata provenance (name|url|datetime of the API call) next to the display value.
-
Requirements
- Saving the identifier together with the display value (no problem for Author, because "Identifier Scheme" and "Identifier" already exist). Does DDI allow IDs for all fields?
- Display values should be updateable (while keeping the identifier of the reference), but only on new versions of the dataset (so the metadata stays immutable) and with confirmation of the user
- A second lookup to gain details about the list from the first lookup is required if the ORCID search API shall be used, as described in ORCID integration in Dataverse #4236
- If the TSV configures a field to use an external source and the source is not configured within dataverse, it should warn or break (based on configuration) (on create/update)
- Both TSV and API config would need to be available when creating or updating the metadata
-
UI representation
- Button "Find [person | vocabulary term | object]" next to each field that has an API source configured
- There should be a possibility to have greyed out fields that are configured (in this installation) to only be filled through the API source
- Admin panel to manage custom API sources and browse default ones
- Provide default API sources to be shipped with Dataverse (more examples at the bottom)
- ORCID
- LOC, DNB, other national libraries
- Widespread vocabulary sources
IMPORTANT: The display value would be set when editing the metadata and published together with the provenance information (where the metadata came from) . Once published, all metadata becomes immutable and can only be changed on a new version.
The information that needs to be stored to describe the API source - feel free to add/reconfigure as you wish:
{
metadataSource: {
name: "orcid-person-search",
version: "3.0",
description: "ORCID search person, v${ metadataSource.version }",
api: "https://pub.orcid.org/v${ metadataSource.version }/search/?q=${ query }",
idTemplate: "${ result.orcid-identifier.path }",
idTarget: "AuthorIdentifier",
secondaryFields: {
{
target: "AuthorName",
api: "https://pub.orcid.org/v${ metadataSource.version }/${ getMetadataSourceId() }",
oninput: "autocompleteOrcidPersonName()",
template: "${ result.person.name.given-names } ${ result.person.name.family-name }"
},
{
target: "authorIdentifierScheme",
oninput: "resetField( 'AuthorIdentifier' )",
template: "50"
}
}
}
identifiers: {
"0000-1234-1234-5555": {
stringRepresentation: "Jane Doe",
datetime: "2018-06-22 16:20:25"
},
"0000-1234-1234-9999": {
stringRepresentation: "John Doeson",
datetime: "2018-06-22 16:49:15"
}
}
}
- Example sources
- for Fields referencing people (Author, Contact, Depositor, Contributor, Distributor, Producer, ...)
- Dataverse native API
- External sources: ORCID, LDAP through an API, Library of Congress Authorities, German National Library's norm data (DNB -> GND)
- for Fields referencing external vocabularies (Subject, Keyword, Topic Classification, Language, Production Place, Kind of Data, Software, Data Sources, ...)
- DDI, ... (tdc, see also https://lov.linkeddata.es/dataset/lov)
- Fields referencing other digital objects through a PID (Other ID, Related Publication, Related Datasets, ...)
- Dataverse native API
- External sources: DataCite index, other indices and repositories
- for Fields referencing people (Author, Contact, Depositor, Contributor, Distributor, Producer, ...)