Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions doc/sphinx-guides/source/user/appendix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,18 @@ Additional documentation complementary to the User Guide.
.. contents:: |toctitle|
:local:

.. _metadata-references:

Metadata References
======================

The Dataverse Project is committed to using standard-compliant metadata to ensure that a Dataverse installation's
metadata can be mapped easily to standard metadata schemas and be exported into JSON
format (XML for tabular file metadata) for preservation and interoperability.

Supported Metadata
~~~~~~~~~~~~~~~~~~

Detailed below are what metadata schemas we support for Citation and Domain Specific Metadata in the Dataverse Project:

- `Citation Metadata <https://docs.google.com/spreadsheet/ccc?key=0AjeLxEN77UZodHFEWGpoa19ia3pldEFyVFR0aFVGa0E#gid=0>`__: compliant with `DDI Lite <http://www.ddialliance.org/specification/ddi2.1/lite/index.html>`_, `DDI 2.5 Codebook <http://www.ddialliance.org/>`__, `DataCite 3.1 <http://schema.datacite.org/meta/kernel-3.1/doc/DataCite-MetadataKernel_v3.1.pdf>`__, and Dublin Core's `DCMI Metadata Terms <http://dublincore.org/documents/dcmi-terms/>`__ (`see .tsv version <https://github.com/IQSS/dataverse/blob/master/scripts/api/data/metadatablocks/citation.tsv>`__). Language field uses `ISO 639-1 <https://www.loc.gov/standards/iso639-2/php/English_list.php>`__ controlled vocabulary.
Expand All @@ -26,6 +31,15 @@ Detailed below are what metadata schemas we support for Citation and Domain Spec
`Virtual Observatory (VO) Discovery and Provenance Metadata <http://perma.cc/H5ZJ-4KKY>`__ (`see .tsv version <https://github.com/IQSS/dataverse/blob/master/scripts/api/data/metadatablocks/astrophysics.tsv>`__).
- `Life Sciences Metadata <https://docs.google.com/spreadsheet/ccc?key=0AjeLxEN77UZodHFEWGpoa19ia3pldEFyVFR0aFVGa0E#gid=2>`__: based on `ISA-Tab Specification <https://isa-specs.readthedocs.io/en/latest/isamodel.html>`__, along with controlled vocabulary from subsets of the `OBI Ontology <http://bioportal.bioontology.org/ontologies/OBI>`__ and the `NCBI Taxonomy for Organisms <http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/>`__ (`see .tsv version <https://github.com/IQSS/dataverse/blob/master/scripts/api/data/metadatablocks/biomedical.tsv>`__).
- `Journal Metadata <https://docs.google.com/spreadsheets/d/13HP-jI_cwLDHBetn9UKTREPJ_F4iHdAvhjmlvmYdSSw/edit#gid=8>`__: based on the `Journal Archiving and Interchange Tag Set, version 1.2 <https://jats.nlm.nih.gov/archiving/tag-library/1.2/chapter/how-to-read.html>`__ (`see .tsv version <https://github.com/IQSS/dataverse/blob/master/scripts/api/data/metadatablocks/journals.tsv>`__).

Experimental Metadata
~~~~~~~~~~~~~~~~~~~~~

Unlike supported metadata, experimental metadata is not enabled by default in a new Dataverse installation. Feedback via any `channel <https://dataverse.org/contact>`_ is welcome!

- `Computational Workflow Metadata <https://docs.google.com/spreadsheets/d/13HP-jI_cwLDHBetn9UKTREPJ_F4iHdAvhjmlvmYdSSw/edit#gid=447508596>`__: adapted from `Bioschemas Computational Workflow Profile, version 1.0 <https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE>`__ and `Codemeta <https://codemeta.github.io/terms/>`__ (`see .tsv version <https://github.com/IQSS/dataverse/blob/master/scripts/api/data/metadatablocks/computationalworkflow.tsv>`__).

See Also
~~~~~~~~

See also the `Dataverse Software 4.0 Metadata Crosswalk: DDI, DataCite, DC, DCTerms, VO, ISA-Tab <https://docs.google.com/spreadsheets/d/10Luzti7svVTVKTA-px27oq3RxCUM-QbiTkm8iMd5C54/edit?usp=sharing>`__ document and the :doc:`/admin/metadatacustomization` section of the Admin Guide.
31 changes: 18 additions & 13 deletions doc/sphinx-guides/source/user/dataset-management.rst
Original file line number Diff line number Diff line change
Expand Up @@ -162,13 +162,13 @@ BagIt Support

BagIt is a set of hierarchical file system conventions designed to support disk-based storage and network transfer of arbitrary digital content. It offers several benefits such as integration with digital libraries, easy implementation, and transfer validation. See `the Wikipedia article <https://en.wikipedia.org/wiki/BagIt>`__ for more information.

If the repository you are using has enabled BagIt file handling, when uploading BagIt files the repository will validate the checksum values listed in each BagIt’s manifest file against the uploaded files and generate errors about any mismatches. The repository will identify a certain number of errors, such as the first five errors in each BagIt file, before reporting the errors.
If the Dataverse installation you are using has enabled BagIt file handling, when uploading BagIt files the repository will validate the checksum values listed in each BagIt’s manifest file against the uploaded files and generate errors about any mismatches. The repository will identify a certain number of errors, such as the first five errors in each BagIt file, before reporting the errors.

|bagit-image1|

You can fix the errors and reupload the BagIt files.

For information on how to enable and configure the BagIt file handler see the :ref:`installation guide <BagIt File Handler>`
More information on how your admin can enable and configure the BagIt file handler can be found in the :ref:`Installation Guide <BagIt File Handler>`.

.. _file-handling:

Expand Down Expand Up @@ -238,10 +238,11 @@ Computational workflows precisely describe a multi-step process to coordinate mu

|cw-image1|


FAIR Computational Workflow
~~~~~~~~~~~~~~~~~~~~~~~~~~~

FAIR principles (Findable, Accessible, Interoperable, Reusable) also apply to computational workflows. The FAIR Principles (https://doi.org/10.1162/dint_a_00033) apply to workflows in two areas as FAIR data and FAIR criteria for workflows as digital objects. In the FAIR data area, "*properly designed workflows contribute to FAIR data principles since they provide the metadata and provenance necessary to describe their data products, and they describe the involved data in a formalized, completely traceable way*" (https://doi.org/10.1162/dint_a_00033). Regarding the FAIR criteria for workflows as digital objects, "*workflows are research products in their own right, encapsulating methodological know-how that is to be found and published, accessed and cited, exchanged and combined with others, and reused as well as adapted*" (https://doi.org/10.1162/dint_a_00033).
The FAIR Principles (Findable, Accessible, Interoperable, Reusable) apply to computational workflows (https://doi.org/10.1162/dint_a_00033) in two areas: as FAIR data and as FAIR criteria for workflows as digital objects. In the FAIR data area, "*properly designed workflows contribute to FAIR data principles since they provide the metadata and provenance necessary to describe their data products, and they describe the involved data in a formalized, completely traceable way*" (https://doi.org/10.1162/dint_a_00033). Regarding the FAIR criteria for workflows as digital objects, "*workflows are research products in their own right, encapsulating methodological know-how that is to be found and published, accessed and cited, exchanged and combined with others, and reused as well as adapted*" (https://doi.org/10.1162/dint_a_00033).

How to Create a Computational Workflow
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -257,31 +258,35 @@ You are encouraged to review these examples when creating a computational workfl

At https://workflows.community, the Workflows Community Initiative offers resources for computational workflows, such as a list of workflow systems (https://workflows.community/systems) and other workflow registries (https://workflows.community/registries). The initiative also helps organize working groups related to workflows research, development and application.

How to Upload your Computational Workflow
How to Upload Your Computational Workflow
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When you :ref:`add a new dataset <adding-new-dataset>`, the Dataverse repository you are using may provide additional support for describing computational workflows, including Computational Workflow Metadata fields for describing your workflow and a "Workflow" tag you can apply to your workflow files.
After you :ref:`upload your files <dataset-file-upload>`, you can apply a "Workflow" tag to your workflow files, such as your Snakemake or R Notebooks files, so that you and others can find them more easily among your deposit’s other files.

|cw-image3|

|cw-image4|

How to Describe Your Computational Workflow
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The Dataverse installation you are using may have enabled Computational Workflow metadata fields for your use. If so, when :ref:`editing your dataset metadata <adding-new-dataset>`, you will see the fields described below.

|cw-image2|

The three fields are adapted from `Bioschemas Computational Workflow Profile, version 1.0 <https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE>`__ and `Codemeta <https://codemeta.github.io/terms/>`__:
As described in the :ref:`metadata-references` section of the :doc:`/user/appendix`, the three fields are adapted from `Bioschemas Computational Workflow Profile, version 1.0 <https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE>`__ and `Codemeta <https://codemeta.github.io/terms/>`__:

- **Workflow Type**: The kind of Computational Workflow, which is designed to compose and execute a series of computational or data manipulation steps in a scientific application
- **External Code Repository URL**: A link to another public repository where the un-compiled, human-readable code and related code is also located (e.g., SVN, GitHub, GitLab, CodePlex)
- **External Code Repository URL**: A link to another public repository where the un-compiled, human-readable code and related code is also located (e.g., GitHub, GitLab, SVN)
- **Documentation**: A link (URL) to the documentation or text describing the Computational Workflow and its use

After you :ref:`upload your files <dataset-file-upload>`, you can apply a "Workflow" tag to your workflow files, such as your Snakemake or R Notebooks files, so that you and others can find them more easily among your deposit’s other files.

|cw-image3|

|cw-image4|

How to Search for Computational Workflows
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If the search page of the Dataverse repository you are using includes a "Dataset Feature" facet with a Computational Workflows link, you can follow that link to find only datasets that contain computational workflows.

You can also use the "Workflow Type" facet, if the Dataverse repository uses it, to find datasets that contain certain types of computational workflows, such as workflows written in Common Workflow Language files or Jupyter Notebooks.
You can also search on the "Workflow Type" facet, if the Dataverse installation has the field enabled, to find datasets that contain certain types of computational workflows, such as workflows written in Common Workflow Language files or Jupyter Notebooks.

|cw-image5|

Expand Down
4 changes: 2 additions & 2 deletions scripts/api/data/metadatablocks/computational_workflow.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
computationalworkflow Computational Workflow Metadata
#datasetField name title description watermark fieldType displayOrder displayFormat advancedSearchField allowControlledVocabulary allowmultiples facetable displayoncreate required parent metadatablock_id termURI
workflowType Computational Workflow Type The kind of Computational Workflow, which is designed to compose and execute a series of computational or data manipulation steps in a scientific application text 0 TRUE TRUE TRUE TRUE TRUE FALSE computationalworkflow
workflowCodeRepository External Code Repository URL A link to the repository where the un-compiled, human readable code and related code is located (e.g. SVN, GitHub, CodePlex, institutional GitLab instance) https://... url 1 FALSE FALSE TRUE FALSE TRUE FALSE computationalworkflow
workflowCodeRepository External Code Repository URL A link to the repository where the un-compiled, human readable code and related code is located (e.g. GitHub, GitLab, SVN) https://... url 1 FALSE FALSE TRUE FALSE TRUE FALSE computationalworkflow
workflowDocumentation Documentation A link (URL) to the documentation or text describing the Computational Workflow and its use textbox 2 FALSE FALSE TRUE FALSE TRUE FALSE computationalworkflow
#controlledVocabulary DatasetField Value identifier displayOrder
workflowType Common Workflow Language (CWL) workflowtype_cwl 1
Expand All @@ -18,4 +18,4 @@
workflowType Makefile workflowtype_makefile 11
workflowType Other Python-based workflow workflowtype_otherpython 12
workflowType Other R-based workflow workflowtype_otherrbased 13
workflowType Other workflowtype_other 100
workflowType Other workflowtype_other 100
1 change: 0 additions & 1 deletion scripts/api/setup-datasetfields.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,4 @@ curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @da
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @data/metadatablocks/astrophysics.tsv -H "Content-type: text/tab-separated-values"
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @data/metadatablocks/biomedical.tsv -H "Content-type: text/tab-separated-values"
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @data/metadatablocks/journals.tsv -H "Content-type: text/tab-separated-values"
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @data/metadatablocks/computational_workflow.tsv -H "Content-type: text/tab-separated-values"

Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ datasetfieldtype.workflowType.title=Workflow Type
datasetfieldtype.workflowType.description=The kind of Computational Workflow, which is designed to compose and execute a series of computational or data manipulation steps in a scientific application
datasetfieldtype.workflowType.watermark=
datasetfieldtype.workflowCodeRepository.title=External Code Repository URL
datasetfieldtype.workflowCodeRepository.description=A link to another public repository where the un-compiled, human-readable code and related code is also located (e.g., SVN, GitHub, GitLab, CodePlex)
datasetfieldtype.workflowCodeRepository.description=A link to another public repository where the un-compiled, human-readable code and related code is also located (e.g., GitHub, GitLab, SVN)
datasetfieldtype.workflowCodeRepository.watermark=https://...
datasetfieldtype.workflowDocumentation.title=Documentation
datasetfieldtype.workflowDocumentation.description=A link (URL) to the documentation or text describing the Computational Workflow and its use
Expand Down