-
Notifications
You must be signed in to change notification settings - Fork 535
Closed
Description
In issue #2243, some metadata fields important for dataset discovery were excluded from mapping to Schema.org. We said we'd include them in a later issue. This is that issue, and these are those fields (dun dun):
- Creator types (person or organization) for dataset authors (a separate ticket has been opened for this, Improving Dataverse's Schema.org JSON-LD schema to enable author names display in Google Dataset Search's #5029, since the missing creator types are making Google's Dataset Search engine not display creator names and there are UI implications)
- Dataset identifier (DOI or HDL as a URL) using the
@idproperty - Dataset identifier (DOI or HDL as a URL) using the
urlproperty - Name of funding source (that's all schema.org supports for now; how to include other funding source details (like grant numbers) are discussed in this github issue in schema.org's repo)
- Author identifiers
- Geographic coverage
- Multiple dataset descriptions
- File metadata (see schema.org from Zenodo and from ICPSR for an example of how these fields are used)
- File PID
- File download URL (when there is one - excludes restricted files and files in datasets with guestbooks)
- File name
- File description
- File format
We'll also need to fix:
- The property used for dataset authors. Dataverse is using
author, but I think Google Dataset Search is ignoring author and preferscreator. (See this comment on Improving Dataverse's Schema.org JSON-LD schema to enable author names display in Google Dataset Search's #5029) - Keywords and Topic Classifications (Dataverse concatenates each keyword into one value, and each topic classification into one value; see this dataset's schema.org metadata)
- Provider: Change the value to use the installation name (instead of hardcoding "Dataverse")
- For the
providerproperty we hardcode "Dataverse" and put the installation name in theDataCatalognameproperty, but Dataset Search is displaying a "Data provided by" field and is using what's in theproviderproperty.
- For the
Which fields are added to the Schema.org metadata template (draft) and how they're mapped will probably be adjusted after community discussion (within Dataverse community and hopefully with a proposed RDA group focused on ways to make data more discoverable by search engines).
@scolapasta asked me to add to the definition of done that we should make sure that the methods used to pull metadata values from different fields into different exports (DDI, DC, DataCite, Schema.org, native JSON (?)) are consistent.
Reactions are currently unavailable