, , , , - , ,
, -, ,
, , - ,
, ,
, , , ,
, , , .
+.. _dataset-file-upload:
+
File Upload
===========
@@ -153,6 +157,19 @@ Beginning with Dataverse Software 5.0, the way a Dataverse installation handles
- If a user attempts to replace a file with another file that has the same checksum, an error message will be displayed and the file will not be able to be replaced.
- If a user attempts to replace a file with a file that has the same checksum as a different file in the dataset, a warning will be displayed.
+BagIt Support
+-------------
+
+BagIt is a set of hierarchical file system conventions designed to support disk-based storage and network transfer of arbitrary digital content. It offers several benefits such as integration with digital libraries, easy implementation, and transfer validation. See `the Wikipedia article `__ for more information.
+
+If the Dataverse installation you are using has enabled BagIt file handling, when uploading BagIt files the repository will validate the checksum values listed in each BagIt’s manifest file against the uploaded files and generate errors about any mismatches. The repository will identify a certain number of errors, such as the first five errors in each BagIt file, before reporting the errors.
+
+|bagit-image1|
+
+You can fix the errors and reupload the BagIt files.
+
+More information on how your admin can enable and configure the BagIt file handler can be found in the :ref:`Installation Guide `.
+
.. _file-handling:
File Handling
@@ -211,6 +228,72 @@ Finally, automating your code can be immensely helpful to the code and research
**Note:** Capturing code dependencies and automating your code will create new files in your directory. Make sure to include them when depositing your dataset.
+Computational Workflow
+----------------------
+
+Computational Workflow Definition
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Computational workflows precisely describe a multi-step process to coordinate multiple computational tasks and their data dependencies that lead to data products in a scientific application. The computational tasks take different forms, such as running code (e.g. Python, C++, MATLAB, R, Julia), invoking a service, calling a command-line tool, accessing a database (e.g. SQL, NoSQL), submitting a job to a compute cloud (e.g. on-premises cloud, AWS, GCP, Azure), and execution of data processing scripts or workflow. The following diagram shows an example of a computational workflow with multiple computational tasks.
+
+|cw-image1|
+
+
+FAIR Computational Workflow
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The FAIR Principles (Findable, Accessible, Interoperable, Reusable) apply to computational workflows (https://doi.org/10.1162/dint_a_00033) in two areas: as FAIR data and as FAIR criteria for workflows as digital objects. In the FAIR data area, "*properly designed workflows contribute to FAIR data principles since they provide the metadata and provenance necessary to describe their data products, and they describe the involved data in a formalized, completely traceable way*" (https://doi.org/10.1162/dint_a_00033). Regarding the FAIR criteria for workflows as digital objects, "*workflows are research products in their own right, encapsulating methodological know-how that is to be found and published, accessed and cited, exchanged and combined with others, and reused as well as adapted*" (https://doi.org/10.1162/dint_a_00033).
+
+How to Create a Computational Workflow
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There are multiple approaches to creating computational workflows. You may consider standard frameworks and tools such as Common Workflow Language (CWL), Snakemake, Galaxy, Nextflow, Ruffus or *ad hoc* methods using different programming languages (e.g. Python, C++, MATLAB, Julia, R), notebooks (e.g. Jupyter Notebook, R Notebook, and MATLAB Live Script) and command-line interpreters (e.g. Bash). Each computational task is defined differently, but all meet the definition of a computational workflow and all result in data products. You can find a few examples of computational workflows in the following GitHub repositories, where each follows several aspects of FAIR principles:
+
+- Common Workflow Language (`GitHub Repository URL `__)
+- R Notebook (`GitHub Repository URL `__)
+- Jupyter Notebook (`GitHub Repository URL `__)
+- MATLAB Script (`GitHub Repository URL `__)
+
+You are encouraged to review these examples when creating a computational workflow and publishing in a Dataverse repository.
+
+At https://workflows.community, the Workflows Community Initiative offers resources for computational workflows, such as a list of workflow systems (https://workflows.community/systems) and other workflow registries (https://workflows.community/registries). The initiative also helps organize working groups related to workflows research, development and application.
+
+How to Upload Your Computational Workflow
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+After you :ref:`upload your files `, you can apply a "Workflow" tag to your workflow files, such as your Snakemake or R Notebooks files, so that you and others can find them more easily among your deposit’s other files.
+
+|cw-image3|
+
+|cw-image4|
+
+How to Describe Your Computational Workflow
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The Dataverse installation you are using may have enabled Computational Workflow metadata fields for your use. If so, when :ref:`editing your dataset metadata `, you will see the fields described below.
+
+|cw-image2|
+
+As described in the :ref:`metadata-references` section of the :doc:`/user/appendix`, the three fields are adapted from `Bioschemas Computational Workflow Profile, version 1.0 `__ and `Codemeta `__:
+
+- **Workflow Type**: The kind of Computational Workflow, which is designed to compose and execute a series of computational or data manipulation steps in a scientific application
+- **External Code Repository URL**: A link to another public repository where the un-compiled, human-readable code and related code is also located (e.g., GitHub, GitLab, SVN)
+- **Documentation**: A link (URL) to the documentation or text describing the Computational Workflow and its use
+
+
+How to Search for Computational Workflows
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If the search page of the Dataverse repository you are using includes a "Dataset Feature" facet with a Computational Workflows link, you can follow that link to find only datasets that contain computational workflows.
+
+You can also search on the "Workflow Type" facet, if the Dataverse installation has the field enabled, to find datasets that contain certain types of computational workflows, such as workflows written in Common Workflow Language files or Jupyter Notebooks.
+
+|cw-image5|
+
+You can also search for files within datasets that have been tagged as "Workflow" files by clicking the Files checkbox to show only files and using the File Tag facet to show only files tagged as "Workflow".
+
+|cw-image6|
+
Astronomy (FITS)
----------------
@@ -622,6 +705,20 @@ If you deaccession the most recently published version of the dataset but not al
:class: img-responsive
.. |image-file-tree-view| image:: ./img/file-tree-view.png
:class: img-responsive
+.. |cw-image1| image:: ./img/computational-workflow-diagram.png
+ :class: img-responsive
+.. |cw-image2| image:: ./img/computational-workflow-metadata.png
+ :class: img-responsive
+.. |cw-image3| image:: ./img/file-tags-link.png
+ :class: img-responsive
+.. |cw-image4| image:: ./img/file-tags-options.png
+ :class: img-responsive
+.. |cw-image5| image:: ./img/computational-workflow-facets.png
+ :class: img-responsive
+.. |cw-image6| image:: ./img/file-tags-facets.png
+ :class: img-responsive
+.. |bagit-image1| image:: ./img/bagit-handler-errors.png
+ :class: img-responsive
.. _Make Data Count: https://makedatacount.org
.. _Crossref: https://crossref.org
diff --git a/doc/sphinx-guides/source/user/img/DatasetDiagram.png b/doc/sphinx-guides/source/user/img/DatasetDiagram.png
old mode 100755
new mode 100644
index 45a21456a08..471a54c2d83
Binary files a/doc/sphinx-guides/source/user/img/DatasetDiagram.png and b/doc/sphinx-guides/source/user/img/DatasetDiagram.png differ
diff --git a/doc/sphinx-guides/source/user/img/bagit-handler-errors.png b/doc/sphinx-guides/source/user/img/bagit-handler-errors.png
new file mode 100644
index 00000000000..d4059ca53c9
Binary files /dev/null and b/doc/sphinx-guides/source/user/img/bagit-handler-errors.png differ
diff --git a/doc/sphinx-guides/source/user/img/computational-workflow-diagram.png b/doc/sphinx-guides/source/user/img/computational-workflow-diagram.png
new file mode 100644
index 00000000000..efb073737dd
Binary files /dev/null and b/doc/sphinx-guides/source/user/img/computational-workflow-diagram.png differ
diff --git a/doc/sphinx-guides/source/user/img/computational-workflow-facets.png b/doc/sphinx-guides/source/user/img/computational-workflow-facets.png
new file mode 100644
index 00000000000..c790e1d5ffb
Binary files /dev/null and b/doc/sphinx-guides/source/user/img/computational-workflow-facets.png differ
diff --git a/doc/sphinx-guides/source/user/img/computational-workflow-metadata.png b/doc/sphinx-guides/source/user/img/computational-workflow-metadata.png
new file mode 100644
index 00000000000..2c477e75b1e
Binary files /dev/null and b/doc/sphinx-guides/source/user/img/computational-workflow-metadata.png differ
diff --git a/doc/sphinx-guides/source/user/img/file-tags-facets.png b/doc/sphinx-guides/source/user/img/file-tags-facets.png
new file mode 100644
index 00000000000..ce2a9bd72a8
Binary files /dev/null and b/doc/sphinx-guides/source/user/img/file-tags-facets.png differ
diff --git a/doc/sphinx-guides/source/user/img/file-tags-link.png b/doc/sphinx-guides/source/user/img/file-tags-link.png
new file mode 100644
index 00000000000..c0496a4e1ba
Binary files /dev/null and b/doc/sphinx-guides/source/user/img/file-tags-link.png differ
diff --git a/doc/sphinx-guides/source/user/img/file-tags-options.png b/doc/sphinx-guides/source/user/img/file-tags-options.png
new file mode 100644
index 00000000000..4af196c690e
Binary files /dev/null and b/doc/sphinx-guides/source/user/img/file-tags-options.png differ
diff --git a/scripts/api/data/metadatablocks/computational_workflow.tsv b/scripts/api/data/metadatablocks/computational_workflow.tsv
new file mode 100644
index 00000000000..51b69cfdb80
--- /dev/null
+++ b/scripts/api/data/metadatablocks/computational_workflow.tsv
@@ -0,0 +1,21 @@
+#metadataBlock name dataverseAlias displayName
+ computationalworkflow Computational Workflow Metadata
+#datasetField name title description watermark fieldType displayOrder displayFormat advancedSearchField allowControlledVocabulary allowmultiples facetable displayoncreate required parent metadatablock_id termURI
+ workflowType Computational Workflow Type The kind of Computational Workflow, which is designed to compose and execute a series of computational or data manipulation steps in a scientific application text 0 TRUE TRUE TRUE TRUE TRUE FALSE computationalworkflow
+ workflowCodeRepository External Code Repository URL A link to the repository where the un-compiled, human readable code and related code is located (e.g. GitHub, GitLab, SVN) https://... url 1 FALSE FALSE TRUE FALSE TRUE FALSE computationalworkflow
+ workflowDocumentation Documentation A link (URL) to the documentation or text describing the Computational Workflow and its use textbox 2 FALSE FALSE TRUE FALSE TRUE FALSE computationalworkflow
+#controlledVocabulary DatasetField Value identifier displayOrder
+ workflowType Common Workflow Language (CWL) workflowtype_cwl 1
+ workflowType Workflow Description Language (WDL) workflowtype_wdl 2
+ workflowType Nextflow workflowtype_nextflow 3
+ workflowType Snakemake workflowtype_snakemake 4
+ workflowType Ruffus workflowtype_ruffus 5
+ workflowType DAGMan workflowtype_dagman 6
+ workflowType Jupyter Notebook workflowtype_jupyter 7
+ workflowType R Notebook workflowtype_rstudio 8
+ workflowType MATLAB Script workflowtype_matlab 9
+ workflowType Bash Script workflowtype_bash 10
+ workflowType Makefile workflowtype_makefile 11
+ workflowType Other Python-based workflow workflowtype_otherpython 12
+ workflowType Other R-based workflow workflowtype_otherrbased 13
+ workflowType Other workflowtype_other 100
diff --git a/scripts/api/setup-datasetfields.sh b/scripts/api/setup-datasetfields.sh
index 0d2d60b9538..0d79176c099 100755
--- a/scripts/api/setup-datasetfields.sh
+++ b/scripts/api/setup-datasetfields.sh
@@ -7,3 +7,4 @@ curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @da
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @data/metadatablocks/astrophysics.tsv -H "Content-type: text/tab-separated-values"
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @data/metadatablocks/biomedical.tsv -H "Content-type: text/tab-separated-values"
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @data/metadatablocks/journals.tsv -H "Content-type: text/tab-separated-values"
+
diff --git a/src/main/java/edu/harvard/iq/dataverse/ControlledVocabularyValue.java b/src/main/java/edu/harvard/iq/dataverse/ControlledVocabularyValue.java
index 213d648da71..181d939f4a1 100644
--- a/src/main/java/edu/harvard/iq/dataverse/ControlledVocabularyValue.java
+++ b/src/main/java/edu/harvard/iq/dataverse/ControlledVocabularyValue.java
@@ -148,7 +148,7 @@ public static String getLocaleStrValue(String strValue, String fieldTypeName, St
return sendDefault ? strValue : null;
}
} catch (MissingResourceException | NullPointerException e) {
- logger.warning("Error finding" + "controlledvocabulary." + fieldTypeName + "." + key + " in " + ((locale==null)? "defaultLang" : locale.getLanguage()) + " : " + e.getLocalizedMessage());
+ logger.warning("Error finding " + "controlledvocabulary." + fieldTypeName + "." + key + " in " + ((locale==null)? "defaultLang" : locale.getLanguage()) + " : " + e.getLocalizedMessage());
return sendDefault ? strValue : null;
}
}
diff --git a/src/main/java/propertyFiles/computationalworkflow.properties b/src/main/java/propertyFiles/computationalworkflow.properties
new file mode 100644
index 00000000000..eb15ecf9982
--- /dev/null
+++ b/src/main/java/propertyFiles/computationalworkflow.properties
@@ -0,0 +1,27 @@
+metadatablock.name=computationalworkflow
+metadatablock.displayName=Computational Workflow Metadata
+metadatablock.displayFacet=Computational Workflow
+datasetfieldtype.workflowType.title=Workflow Type
+datasetfieldtype.workflowType.description=The kind of Computational Workflow, which is designed to compose and execute a series of computational or data manipulation steps in a scientific application
+datasetfieldtype.workflowType.watermark=
+datasetfieldtype.workflowCodeRepository.title=External Code Repository URL
+datasetfieldtype.workflowCodeRepository.description=A link to another public repository where the un-compiled, human-readable code and related code is also located (e.g., GitHub, GitLab, SVN)
+datasetfieldtype.workflowCodeRepository.watermark=https://...
+datasetfieldtype.workflowDocumentation.title=Documentation
+datasetfieldtype.workflowDocumentation.description=A link (URL) to the documentation or text describing the Computational Workflow and its use
+datasetfieldtype.workflowDocumentation.watermark=
+controlledvocabulary.workflowType.common_workflow_language_(cwl)=Common Workflow Language (CWL)
+controlledvocabulary.workflowType.workflow_description_language_(wdl)=Workflow Description Language (WDL)
+controlledvocabulary.workflowType.nextflow=Nextflow
+controlledvocabulary.workflowType.snakemake=Snakemake
+controlledvocabulary.workflowType.ruffus=Ruffus
+controlledvocabulary.workflowType.jupyter_notebook=Jupyter Notebook
+controlledvocabulary.workflowType.r_notebook=R Notebook
+controlledvocabulary.workflowType.dagman=DAGMan
+controlledvocabulary.workflowType.matlab_script=MATLAB Script
+controlledvocabulary.workflowType.bash_script=Bash Script
+controlledvocabulary.workflowType.makefile=Makefile
+controlledvocabulary.workflowType.other_python-based_workflow=Other Python-based workflow
+controlledvocabulary.workflowType.other_r-based_workflow=Other R-based workflow
+controlledvocabulary.workflowType.other=Other
+