Skip to content

As a researcher, I want more dataset metadata in schema.org exports so that my data is more discoverable #4371

@jggautier

Description

@jggautier

In issue #2243, some metadata fields important for dataset discovery were excluded from mapping to Schema.org. We said we'd include them in a later issue. This is that issue, and these are those fields (dun dun):

  • Creator types (person or organization) for dataset authors (a separate ticket has been opened for this, Improving Dataverse's Schema.org JSON-LD schema to enable author names display in Google Dataset Search's #5029, since the missing creator types are making Google's Dataset Search engine not display creator names and there are UI implications)
  • Dataset identifier (DOI or HDL as a URL) using the @id property
  • Dataset identifier (DOI or HDL as a URL) using the url property
  • Name of funding source (that's all schema.org supports for now; how to include other funding source details (like grant numbers) are discussed in this github issue in schema.org's repo)
  • Author identifiers
  • Geographic coverage
  • Multiple dataset descriptions
  • File metadata (see schema.org from Zenodo and from ICPSR for an example of how these fields are used)
    • File PID
    • File download URL (when there is one - excludes restricted files and files in datasets with guestbooks)
    • File name
    • File description
    • File format

We'll also need to fix:

Which fields are added to the Schema.org metadata template (draft) and how they're mapped will probably be adjusted after community discussion (within Dataverse community and hopefully with a proposed RDA group focused on ways to make data more discoverable by search engines).

@scolapasta asked me to add to the definition of done that we should make sure that the methods used to pull metadata values from different fields into different exports (DDI, DC, DataCite, Schema.org, native JSON (?)) are consistent.

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions